4 Rolling, cumulative, and lagged/leading values

Windowing operations are typically defined as calculations performed over a sliding partition of an array – for instance, rolling means and sums. Other useful operations include accumulating values in a sequence, computing leading or lagged values, and so on. What the have in common is that all these operations involve calculations based on specific positions within an array. That’s why I call them indexing operations. In the following sections, we’ll explore applications of this type of operation.

4.1 Rolling means

Rolling operations are commonly used to smooth volatile time series or to mitigate seasonal effects. Consider the Google Mobility data for Brazil in 2021 we saw in the first chapter. Remember that this data set has daily frequency and that mobility in workplaces is higher on weekdays. Therefore, a simple strategy to remove this seasonal pattern is to take the 7-day rolling mean.

For this, we can use the roll_mean function from the RcppRoll package. In addition to the mean, the package provides functions to compute several other rolling statistics – minimum/maximum, median, standard deviations, and so on. Also, we can use the suffixes l(eft), c(enter), or r(ight) instead of the align parameter inside the function call to control how the window is aligned in the calculations.

library(tidyverse)
gmob_data_br <- read_csv(
  unz(
    'data/Region_Mobility_Report_CSVs.zip', "2021_BR_Region_Mobility_Report.csv"
  )
) |> 
  filter(is.na(sub_region_1)) |> 
  select(date, mobility_workplaces = contains('workplaces'))

library(RcppRoll)
gmob_data_br_7dma <- gmob_data_br |>
  arrange(date) |> 
  mutate(
    mobility_workplaces_7dma = roll_meanr(
      mobility_workplaces, 
      n = 7, 
      na.rm = TRUE
    )
  )

gmob_data_br_7dma |> 
  ggplot(aes(x = date)) +
  geom_line(aes(y = mobility_workplaces, color = 'Mobility in Workplaces'), lwd = 1) +
  geom_line(aes(y = mobility_workplaces_7dma, color = 'Mobility in Workplaces - 7d MA'), lwd = 1) +
  theme(legend.position = 'top') +
  labs(
    title = 'Brazil: Mobility in workplaces (% change from baseline)',
    x = '',
    y = '',
    color = ''
  )

4.2 Accumulated in n-periods

Using the rolling mean to smooth out very volatile time series or to mitigate seasonal patterns is a natural choice when we are interested in the level of the series. However, when dealing with ratios, the most appropriate procedure is to compute accumulated values over twelve months for monthly series or over four quarters for quarterly series. For instance, consider the monthly US CPI data we saw in the first chapter.

Data

The monthly US CPI data set is available in the R4ER2data package under the name cpi_us.

cpi_tbl <- R4ER2data::cpi_us |> 
  mutate(
    date = as.Date(date)
  )

cpi_12m <- cpi_tbl |> 
  arrange(date) |> 
  mutate(
    value_12m = (roll_prodr(1 + (value / 100), n = 12) -1) * 100
  )

cpi_12m |> 
  ggplot(aes(x = date)) +
  geom_line(aes(y = value_12m), lwd = 1) +
  theme(legend.position = 'top') +
  labs(
    title = 'US: CPI accumulated in 12-months (%)',
    x = '',
    y = '',
    color = ''
  )

4.3 From changes to level

Sometimes we are more interested in the level of a series rather than its variation. This is particularly useful when we have reason to believe that the data should lie within a given range or revert to an expected path. To obtain the level of a series from its changes, all we need to do is accumulate those changes over time.¹ Using the US CPI data, we have:

cpi_level <- cpi_tbl |> 
  arrange(date) |> 
  mutate(
    value_level = cumprod(1 + (value / 100)),
    value_level = (value_level / first(value_level)) * 100
  )

cpi_level |> 
  ggplot(aes(x = date)) +
  geom_line(aes(y = value_level), lwd = 1) +
  theme(legend.position = 'top') +
  scale_x_date(date_breaks = '1 year', date_labels = '%Y') +
  labs(
    title = 'US: CPI in level (Jan/2010 = 100)',
    x = '',
    y = '',
    color = ''
  )

Looking at the series in level makes it easier for the analyst to conjecture possible scenarios for inflation. For example, it could either remain constant by extrapolating the last value or gradually return to the pre-COVID path.

4.4 Lagged and leading values

Leads and lags of a time series are generally used in regressions but occasionally appear in graphs that aim to compare two or more series that have a non-contemporaneous relationship. Also, knowing how to refer to past or future values of a series can be useful for performing calculations – such as computing changes from a baseline. The lead and lag functions from dplyr package make this task very straightforward.

library(tidyverse)
cpi_lag_lead <- cpi_tbl |> 
  mutate(
    value_lag1  = lag(value, 1),
    value_lag6  = lag(value, 6),
    value_lead2 = lead(value, 2)
  )

Here, we used the starting value of the series as the reference level (value = 100). But we could have used another part of the series as the reference instead.↩︎