9 Single-Equation Models

A fundamental part of economic research consists of estimating relationships between variables. A good starting point is to use reduced-form specifications derived from well-established models in the literature. For example, to analyze the effect of a given variable on economic activity, we can rely on a specification derived from the IS curve. Similarly, we can use a variation of the Phillips Curve to assess the impact of a particular variable on inflation.

In this section, we’ll estimate the parameters of a simple reduced-form Phillips Curve, check the model’s basic properties, and extract useful insights from it. It’s worth noting that the same procedures apply to any other custom specification.

For this exercise, we’ll use seasonally adjusted quarterly data from the Brazilian economy, covering the period from 2004Q1 to 2022Q4.

Data

The data set containing the variables for this exercise is available in the R4ER2data package under the name br_economy_model.

The basic Phillips Curve can be written as:

\[ \pi_t = \beta_1\pi_{t-1} + \beta_2\pi^{e}_{t,t+4|t} + \beta_3\Delta e_{t-1} + \beta_4\tilde{y}_{t-1} + \epsilon_t \]

where:

\(\pi_t\) is a measure of inflation;
\(\pi^{e}_{t,t+4|t}\) is the expected inflation in period \(t+4\), as seen from period \(t\);
\(e\) is a measure of exchange rate or imported inflation;
\(\tilde{y}\) is a proxy for the output gap.

In this exercise:

\(\pi_t\) corresponds to a measure of core inflation that excludes food-at-home and regulated prices (CPI_CORE);
\(\pi^{e}\) refers to market expectations compiled by the Brazilian Central Bank (CPI_EXP);
\(e\) is an index of commodity prices in USD (CI_USD);
\(\tilde{y}\) is the cyclical component extracted from the GDP series using the HP filter (YGAP).

Let’s start by importing the dataset and visualizing the variables of interest.

library(tidyverse)
cp_data <- R4ER2data::br_economy_model 

cp_data |> 
  select(date, CPI_CORE, CPI_EXP, YGAP, CI_USD) |> 
  pivot_longer(-date, names_to = 'var', values_to = 'value') |> 
  ggplot(aes(x = date, y = value)) +
  geom_line(lwd = 1) +
  theme_light() +
  facet_wrap(~ var, scales = 'free_y') +
  labs(
    title = 'Phillips Curve variables',
    x = '',
    y = ''
  )

Next, we need to construct the necessary variables: the lagged values of CPI and YGAP, and the percentage change in CI_USD. Then, we fit the model to the data.

At this stage, we are imposing no restrictions on the coefficients, although the structural version of the Phillips curve typically does – we’ll address how to impose such restrictions in the next section. Additionally, we are estimating the model using Ordinary Least Squares (OLS), even though a method that is robust to endogeneity, such as the Generalized Method of Moments (GMM), would be more appropriate.

cp_reg_data <- cp_data |> 
  select(date, CPI_CORE, CPI_EXP, CI_USD, YGAP) |> 
  mutate(
    CPI_CORE_lag = dplyr::lag(CPI_CORE, 1),
    YGAP_lag     = dplyr::lag(YGAP, 1),
    dlog_CI_USD  = log(CI_USD / dplyr::lag(CI_USD)) * 100
  )

cp_fit <- lm(CPI_CORE ~ CPI_CORE_lag + CPI_EXP + YGAP_lag + dlog_CI_USD -1, cp_reg_data)

summary(cp_fit)


Call:
lm(formula = CPI_CORE ~ CPI_CORE_lag + CPI_EXP + YGAP_lag + dlog_CI_USD - 
    1, data = cp_reg_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.79874 -0.19115 -0.03149  0.16141  0.85059 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
CPI_CORE_lag 0.656741   0.092798   7.077 8.48e-10 ***
CPI_EXP      0.089305   0.026820   3.330 0.001381 ** 
YGAP_lag     0.031710   0.011866   2.672 0.009339 ** 
dlog_CI_USD  0.017335   0.004462   3.885 0.000227 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3169 on 71 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9551,    Adjusted R-squared:  0.9526 
F-statistic: 377.6 on 4 and 71 DF,  p-value: < 2.2e-16

The estimated coefficients are highly significant and exhibit the expected signs. To assess the model’s validity, it is crucial that the residuals have a zero mean and display no systematic patterns over time. The checkresiduals() function from the forecast package provides a convenient diagnostic summary of the model residuals.

forecast::checkresiduals(cp_fit)


    Breusch-Godfrey test for serial correlation of order up to 10

data:  Residuals
LM test = 24.948, df = 10, p-value = 0.005445

We can see that the residuals are well-behaved: their mean is approximately zero, there are no outliers, and no visible trend. There is a mild autocorrelation at lag 3, as indicated by the ACF plot, but since it is relatively small and far enough away, I can reasonably be overlooked.

Having confirmed the model’s validity, we can now use it for various purposes. The estimated coefficients offer useful rules of thumb for practical interpretation. For example, the coefficient on dlog_CI_USD captures the pass-through from imported prices to inflation. According to the model, a 10% increase in imported prices leads to a 0.17p.p rise in inflation in the current quarter.

Plotting the model’s fitted values is a helpful way to assess deviations of the target variable from its fundamentals – at least those included in the specification. For this, we can use the augment() function from the broom package, which returns a data frame containing the fitted values, residuals, and other outputs. The broom package also includes additional functions that greatly facilitate the manipulation and presentation of regression results, as we’ll explore later.

library(broom)
cp_fit_plot <- cp_fit |> 
  augment() |> 
  left_join(
    cp_reg_data |> 
      select(date) |> 
      rowid_to_column(var = '.rownames') |> 
      mutate(.rownames = as.character(.rownames))
  ) |> 
  mutate(deviation = CPI_CORE - .fitted)

cp_fit_plot |> 
  ggplot(aes(x = date)) +
  geom_line(aes(y = CPI_CORE, color = 'Actual'), lwd = 1) +
  geom_line(aes(y = .fitted, color = 'Model'), lwd = 1) +
  geom_col(aes(y = deviation, fill = 'Deviation (Actual - Fitted)')) +
  theme_light() +
  theme(legend.position = 'top') +
  scale_fill_manual(values = 'darkgrey') +
  labs(
    title = 'CPI Core: Actual vs. Fitted (%QoQ SA)',
    x = '',
    y = '%',
    color = '',
    fill = ''
  )

What has been the role of economic activity in driving inflation in recent quarters? And what about external factors – have they played a significant part in the overall outcome?

A common practice in applied macroeconometrics it to quantify the contribution of each explanatory variable to the observed value of the dependent variable in a given period. This type of decomposition is obtained by multiplying the value of each variable in the period by its corresponding model coefficient. The difference between the sum of all contributions and the observed value of the left-hand variable is then attributed to the residual.

To implement this, we can use the tidy() function from the broom package, which returns the estimated coefficients in a tidy (i.e. data frame) format that is convenient for further manipulation and merging with the original dataset.

cp_decomp <- cp_fit_plot |> 
  select(date, names(cp_fit$coefficients)) |> 
  pivot_longer(-date, names_to = 'term', values_to = 'value') |> 
  left_join(
    cp_fit |> 
      broom::tidy() |> 
      select(term, estimate)
  ) |> 
  mutate(contribution = value*estimate) |> 
  bind_rows(
    cp_fit_plot |> 
      select(date, contribution = .resid) |> 
      mutate(term = 'residual')
  )

cp_decomp |> 
  ggplot(aes(x = date)) +
  geom_col(aes(y = contribution, fill = term)) +
  theme_light() +
  scale_fill_brewer(type = 'qual', palette = 6) +
  labs(
    title = 'Contribution of each variable to Core CPI (p.p)',
    x = '', 
    y = '', 
    fill = 'Variable'
  )

We can see that inertia and expectations are the main drivers of inflation throughout the sample, although in specific periods, economic activity and imported inflation also played a significant role. I included the residual term as well, because it is important to identify when factors not incorporated in the model have influenced the outcome – and to assess the magnitude of their contribution to the overall result.

Naturally, the model can also be used to generate forecasts. To do so, we need to provide values for the exogenous variables, including the lagged value of CPI – which, in case of multi-period forecasts, introduces a recursive structure. In the next section, we’ll present a more complete approach to building future scenarios. For now, I’ll simply take the most recent value of each variable and add a small random variation.

set.seed(123)
new_values <- tibble(
  CPI_CORE_lag = last(cp_reg_data$CPI_CORE),
  CPI_EXP      = last(cp_reg_data$CPI_EXP) + rnorm(1),
  YGAP_lag     = last(cp_reg_data$YGAP) + rnorm(1),
  dlog_CI_USD  = last(cp_reg_data$dlog_CI_USD) + rnorm(1)
)

predict(cp_fit, new_values)

       1 
1.341488