library(tidyverse)
<- read_csv(
gmob_data_br unz(
'data/Region_Mobility_Report_CSVs.zip', "2021_BR_Region_Mobility_Report.csv"
)|>
) filter(is.na(sub_region_1)) |>
select(date, mobility_workplaces = contains('workplaces'))
4 Rolling, cumulative, and lagged/leading values
Windowing operations are typically defined as calculations performed over a sliding partition of an array – for instance, rolling means and sums. Other useful operations include accumulating values in a sequence, computing leading or lagged values, and so on. What the have in common is that all these operations involve calculations based on specific positions within an array. That’s why I call them indexing operations. In the following sections, we’ll explore applications of this type of operation.
4.1 Rolling means
Rolling operations are commonly used to smooth volatile time series or to mitigate seasonal effects. Consider the Google Mobility data for Brazil in 2021 we saw in the first chapter. Remember that this data set has daily frequency and that mobility in workplaces is higher on weekdays. Therefore, a simple strategy to remove this seasonal pattern is to take the 7-day rolling mean.
For this, we can use the roll_mean
function from the RcppRoll
package. In addition to the mean, the package provides functions to compute several other rolling statistics – minimum/maximum, median, standard deviations, and so on. Also, we can use the suffixes l(eft), c(enter), or r(ight) instead of the align
parameter inside the function call to control how the window is aligned in the calculations.
library(RcppRoll)
<- gmob_data_br |>
gmob_data_br_7dma arrange(date) |>
mutate(
mobility_workplaces_7dma = roll_meanr(
mobility_workplaces, n = 7,
na.rm = TRUE
) )
|>
gmob_data_br_7dma ggplot(aes(x = date)) +
geom_line(aes(y = mobility_workplaces, color = 'Mobility in Workplaces'), lwd = 1) +
geom_line(aes(y = mobility_workplaces_7dma, color = 'Mobility in Workplaces - 7d MA'), lwd = 1) +
theme(legend.position = 'top') +
labs(
title = 'Brazil: Mobility in workplaces (% change from baseline)',
x = '',
y = '',
color = ''
)
4.2 Accumulated in n-periods
Using the rolling mean to smooth out very volatile time series or to mitigate seasonal patterns is a natural choice when we are interested in the level of the series. However, when dealing with ratios, the most appropriate procedure is to compute accumulated values over twelve months for monthly series or over four quarters for quarterly series. For instance, consider the monthly US CPI data we saw in the first chapter.
The monthly US CPI data set is available in the R4ER2data
package under the name cpi_us
.
<- R4ER2data::cpi_us |>
cpi_tbl mutate(
date = as.Date(date)
)
<- cpi_tbl |>
cpi_12m arrange(date) |>
mutate(
value_12m = (roll_prodr(1 + (value / 100), n = 12) -1) * 100
)
|>
cpi_12m ggplot(aes(x = date)) +
geom_line(aes(y = value_12m), lwd = 1) +
theme(legend.position = 'top') +
labs(
title = 'US: CPI accumulated in 12-months (%)',
x = '',
y = '',
color = ''
)
4.3 From changes to level
Sometimes we are more interested in the level of a series rather than its variation. This is particularly useful when we have reason to believe that the data should lie within a given range or revert to an expected path. To obtain the level of a series from its changes, all we need to do is accumulate those changes over time.1 Using the US CPI data, we have:
<- cpi_tbl |>
cpi_level arrange(date) |>
mutate(
value_level = cumprod(1 + (value / 100)),
value_level = (value_level / first(value_level)) * 100
)
|>
cpi_level ggplot(aes(x = date)) +
geom_line(aes(y = value_level), lwd = 1) +
theme(legend.position = 'top') +
scale_x_date(date_breaks = '1 year', date_labels = '%Y') +
labs(
title = 'US: CPI in level (Jan/2010 = 100)',
x = '',
y = '',
color = ''
)
Looking at the series in level makes it easier for the analyst to conjecture possible scenarios for inflation. For example, it could either remain constant by extrapolating the last value or gradually return to the pre-COVID path.
4.4 Lagged and leading values
Leads and lags of a time series are generally used in regressions but occasionally appear in graphs that aim to compare two or more series that have a non-contemporaneous relationship. Also, knowing how to refer to past or future values of a series can be useful for performing calculations – such as computing changes from a baseline. The lead
and lag
functions from dplyr
package make this task very straightforward.
library(tidyverse)
<- cpi_tbl |>
cpi_lag_lead mutate(
value_lag1 = lag(value, 1),
value_lag6 = lag(value, 6),
value_lead2 = lead(value, 2)
)
Here, we used the starting value of the series as the reference level (value = 100). But we could have used another part of the series as the reference instead.↩︎