Inferential SNA: Peer effect models

Benjamin Rosche and Wicia Fang

Most courses on social network analysis (SNA) focus on descriptive SNA, such as measuring the density of a network, identifying subgroups within a network, or examining the centrality of actors within a network. Inferential SNA, which focuses on explaining the formation of networks and behaviors and beliefs of actors embedded in networks, by contrast, is often eschewed.

This short course is an introduction to inferential SNA. The focus of this course is on models to explain how the behaviors and beliefs of actors are influenced by the networks in which they are embedded. Covered are cross-sectional, panel, and dynamic panel models to estimate exogenous and endogenous peer effects.

In the following, we examine under which circumstances correct peer effects are recovered and under which circumstances they are not. For that, I created four cross-sectional scenarios (dat1.1 through dat1.4) and four panel scenarios (dat2.1 through dat2.4). These datasets are simulated and differ with respect to whether the peer effect is exogenous or endogenous and whether the networks have formed randomly or not. For each of the simulated scenarios, there is a corresponding adjacency matrix (w1.1 through w2.4) describing the network structure.

1. Cross-sectional models

1.1. Exogenous peer effects and random networks

In this first scenario (dat1.1 and w1.1), we consider exogenous peer effects on a social network that has formed randomly.

The network formation process is very simple, each dyad has a 50% change of existing:

\(Pr(ij) = 0.5\)

The outcome in all examples is achievement and the explanatory variables are SI and I. Both explanatory variables are randomly and independently normally distributed and there are no confounding influences in the data generation process (DGP).

SI (selection+influence) is a variable that affects both selection into dyads and influences the outcome while I (influence) only affects the outcome. However, since network formation is random in 1.1., both variables only affect the outcome:

\(achievement_i = 1SI_i + 1 I_i + 1 \sum_{j}w_{ij}SI_j + 1 \sum_{j}w_{ij}I_j\)

That is, achievement of i is influenced by its own value of SI and I and by the average SI and I of her peers. As such, the exogenous peer effect is local because individuals are only affected by those peers to which they are directly connected.

lm(achievement ~ SI + I, data=dat1.1) # A: conventional regression model

## 
## Call:
## lm(formula = achievement ~ SI + I, data = dat1.1)
## 
## Coefficients:
## (Intercept)           SI            I  
##      0.0354       1.0127       0.9773

In (A), the exogenous peer effects (\(SI_j\), \(I_j\)) are omitted. We observe that ego’s corresponding features (\(SI_i\), \(I_i\)) are nonetheless estimated correctly.

lmSLX(
  achievement ~ SI + I, 
  Durbin = ~ SI + I,
  listw = mat2listw(w1.1, style="W"), # adjacency matrix
  data = dat1.1
) # B: Spatial lagged-X model

## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Coefficients:
## (Intercept)           SI            I       lag.SI        lag.I  
##  -9.992e-17    1.000e+00    1.000e+00    1.000e+00    1.000e+00

lmSLX(
  achievement ~ SI, 
  Durbin = ~ SI ,
  listw = mat2listw(w1.1, style="W"), # adjacency matrix
  data = dat1.1
) # C: Spatial lagged-X model

## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Coefficients:
## (Intercept)           SI       lag.SI  
##    -0.02141      1.03459      1.00996

In (B), we include the exogenous peer effects. We observe that the estimates are perfectly recovered. We observe in (C) that even if covariates are omitted, the remaining individual and exogenous peer effects are still estimated correctly.

Generally, it makes sense to also include the adjacency matrix in the disturbance term as it will produce more appropriate standard errors (and with that, hypothesis tests). This is done in (D):

lagsarlm(
  achievement ~ SI + I,
  Durbin = ~ SI + I, # exogenous peer effects
  listw = mat2listw(w1.1, style="W"),
  data = dat1.1
) # D: Spatial Durbin error model (not run)

1.2. Exogenous peer effects and endogenous networks

In dat1.2, we consider a situation in which network formation is endogenous:

\(Pr(ij) = 1 / (1 + exp(-( 1 * similaritySI_{ij} )))\)

That is, dyads are more likely for peers that are similar in SI. The DGP for the outcome is the same:

\(achievement_i = 1SI_i + 1 I_i + 1 \sum_{j}w_{ij}SI_j + 1 \sum_{j}w_{ij}I_j\)

lm(achievement ~ SI + I, data=dat1.2) # A: Conventional regression model, both exogenous peer effects omitted

## 
## Call:
## lm(formula = achievement ~ SI + I, data = dat1.2)
## 
## Coefficients:
## (Intercept)           SI            I  
##      0.0158       1.7799       1.0361

In (A), we observe that that the ego feature I is still estimated correctly with a conventional regression model. SI, however, is biased now. More generally, the effect of individual features that are involved in the selection process will be biased if the variables also have an effect on peers.

lmSLX(
  achievement ~ SI + I, 
  Durbin = ~ SI,
  listw = mat2listw(w1.2, style="W"),
  data = dat1.2
) # B: Spatial lagged-X model, one exogenous peer effect (I) omitted

## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Coefficients:
## (Intercept)           SI            I       lag.SI  
##     0.03845      0.40807      1.00870      1.78274

As we observe in (B), SI is also biased if we do not include all relevant exogenous peer effects. Moreover, the included exogenous peer effect (SI) is likewise biased!

lmSLX(
  achievement ~ SI + I, 
  Durbin = ~ SI + I,
  listw = mat2listw(w1.2, style="W"),
  data = dat1.2
) # C: Spatial lagged-X model

## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Coefficients:
## (Intercept)           SI            I       lag.SI        lag.I  
##  -5.551e-17    1.000e+00    1.000e+00    1.000e+00    1.000e+00

In (C), we observe that endogenous network formation will not bias estimates if we capture the true DGP.

1.3. Endogenous peer effect and random networks

In dat1.3, networks are random again: \(Pr(ij) = 0.5\) but, here, we consider an endogenous peer effect. The endogenous peer effect has a global impact on the network since any change in the outcome of an individual will not only affect all direct peers but also the peers of peers. The endogenous peer effect can be conceptualized as a social interaction process, in which the value of the dependent variable for each individual is jointly determined with that of her peers. This is the DGP: \(achievement_i = 0.5\sum_{j}w_{ij}achievement_j + 1SI_i + 1I_i\).

m13.listw <- mat2listw(w1.3, style="W")

m13 <-
  lagsarlm(
    achievement ~ SI + I,
    listw=m13.listw,
    data=dat1.3
  ) # A: Spatial autoregressive model

m13

## 
## Call:
## lagsarlm(formula = achievement ~ SI + I, data = dat1.3, listw = m13.listw)
## Type: lag 
## 
## Coefficients:
##          rho  (Intercept)           SI            I 
## 5.000000e-01 4.531651e-09 1.000000e+00 1.000000e+00 
## 
## Log likelihood: 1890.228

We observe in (A) that the endogenous peer effect (rho=\(\rho\)) is estimated correctly. The interpretation of the effect differs from exogenous peer effects since the spillover traverses the network. To understand the impact better, we can calculate marginal effects (dy/dx) of the features (x) using the impacts() function:

impacts(m13, listw = m13.listw)

## Impact measures (lag, exact):
##      Direct  Indirect Total
## SI 1.007514 0.9924863     2
## I  1.007514 0.9924863     2

We observe that the effect of SI and I exhibits a direct and indirect effect. The direct effects equal the regression coefficient of SI and I. The indirect effects equal the proportion of the total effect that is due to the ripple effect of the endogenous peer effect. That is, if \(\beta_{SI}=1, \rho=0.5\), the indirect effect also equals 1. More generally, \(IND=\beta/(1-\rho)\rho\). Note that this is only true for a row-standardizes adjacency matrix (i.e., mean - peer effect).

lagsarlm(
  achievement ~ SI,
  listw=m13.listw,
  data=dat1.3
) # B: Spatial autoregressive model in which I is omitted

## 
## Call:
## lagsarlm(formula = achievement ~ SI, data = dat1.3, listw = m13.listw)
## Type: lag 
## 
## Coefficients:
##         rho (Intercept)          SI 
##  0.60682068 -0.01935358  1.02810393 
## 
## Log likelihood: -69.25506

We observe in (B) that the endogenous peer effect (\(\rho\)) is biased if there are omitted variables but the individual feature effect of SI remains unaffected.

Generally, it makes sense to also include the adjacency matrix in the disturbance term as it will produce more appropriate standard errors (and with that, hypothesis tests). This is done in (C):

sacsarlm(    
  achievement ~ SI + I,
    listw=m13.listw,
    data=dat1.3
  ) # C: Spatial autoregressive combined model (not run)

1.4. Endogenous peer effect and endogenous networks

In dat1.4 we consider an endogenous network formation process:

\(Pr(ij) = 1 / (1 + exp(-( 1 * similaritySI_{ij} )))\)

as well as a endogenous peer effect:

\(achievement_i = 0.5\sum_{j}w_{ij}achievement_j + 1I_i + 1SI_i\).

lagsarlm(
  achievement ~ SI + I,
  listw=mat2listw(w1.4, style="W"),
  data=dat1.4
) # A: Spatial autoregressive model will full DGP

## 
## Call:
## lagsarlm(formula = achievement ~ SI + I, data = dat1.4, listw = mat2listw(w1.4, 
##     style = "W"))
## Type: lag 
## 
## Coefficients:
##           rho   (Intercept)            SI             I 
##  5.000000e-01 -1.036521e-09  1.000000e+00  1.000000e+00 
## 
## Log likelihood: 1959.575

We observe in (A) that the endogenous peer effect estimate is correct despite the endogenous network formation if the true DGP is captured.

lagsarlm(
  achievement ~ I,
  listw=mat2listw(w1.4, style="W"),
  data=dat1.4
) # B: Spatial autoregressive model, exogenous peer effect of a covariate involved in selection process omitted (SI)

## 
## Call:
## lagsarlm(formula = achievement ~ I, data = dat1.4, listw = mat2listw(w1.4, 
##     style = "W"))
## Type: lag 
## 
## Coefficients:
##         rho (Intercept)           I 
##  -1.5087776  -0.2677616   1.0456252 
## 
## Log likelihood: -17.04281

lagsarlm(
  achievement ~ SI,
  listw=mat2listw(w1.4, style="W"),
  data=dat1.4
) # C: Spatial autoregressive model, exogenous peer effect of a covariate NOT involved in the selection process omitted (I)

## 
## Call:
## lagsarlm(formula = achievement ~ SI, data = dat1.4, listw = mat2listw(w1.4, 
##     style = "W"))
## Type: lag 
## 
## Coefficients:
##         rho (Intercept)          SI 
##  0.53848730 -0.03228798  1.03979048 
## 
## Log likelihood: -69.26538

We observe in (B) that effects of covariates that are not involved in the selection process (I) are correctly recovered. The endogenous peer effect (\(\rho\)), however, is estimated incorrectly because because SI is omitted and the selection process therefore unobserved.

In (C), we observe that omitting a variable that is not part of the selection process (I), by contrast, is less problematic. The endogenous peer effect is close to the true value of 0.5 and the individual feature effect (SI) is close to the true value of 1.

2. Panel models

2.1 Exogenous peer effects and random networks

We now move to panel data. dat2.1 exhibits three waves but there are no temporal trends. That is, the DGP for each wave is:

\(Pr(ijt) = 0.5\) and \(achievement_{it} = 1SI_i + 1I_i + 1(\sum_{j}w_{ijt}SI_{jt})_{it} + 1\sum_{j}(w_{ijt}I_{jt})_{it}\)

The advantage of using a panel model is that individual-specific and/or time-specific fixed or random effects can be estimated. Let’s start with a pooled (i.e. cross-sectional) model and omit a time-constant individual-specific effect to observe the advantages of a panel model. Since SI and I do not change across time, we can use them as time-constant unobservables.

lm(
  achievement ~ alter_SI + alter_I, 
  data=dat2.1
) # A: Pooled model

## 
## Call:
## lm(formula = achievement ~ alter_SI + alter_I, data = dat2.1)
## 
## Coefficients:
## (Intercept)     alter_SI      alter_I  
##     -0.1067       0.4424       0.7858

Since lmSLX does not allow including an exogenous peer effect of I without including it as a main effect, I use lm() instead and manually compute \(alter_{SI} = WSI\) and \(alter_I = WI\). We observe in (A) that all effects are biased due to the unobserved time-constant individual-specific confounder (I).

We know that a fixed-effect panel model will remove any bias from time-constant individual-specific effects. The drawback of this model is that no time-constant effects can be estimated. Even though SI and I are such variables, the peer effect can nonetheless be estimated because the networks are probabilistic and change randomly across waves. This is an important insight. If networks change across waves, peer effects of time-constant variables can nonetheless be estimated:

plm(
  achievement ~ alter_SI + alter_I, 
  model="within", 
  index = c("uid", "wave"), 
  data=dat2.1
) # B: Panel fixed effect model, all exogenous peer effects included

## 
## Model Formula: achievement ~ alter_SI + alter_I
## 
## Coefficients:
## alter_SI  alter_I 
##        1        1

plm(
  achievement ~ alter_SI, 
  model="within", 
  index = c("uid", "wave"), 
  data=dat2.1
) # B: Panel fixed effect model, alter_I omitted

## 
## Model Formula: achievement ~ alter_SI
## 
## Coefficients:
## alter_SI 
##   1.0061

We observe in (B) that the exogenous peer effects perfectly are perfectly recovered. The estimated effects are correct regardless whether or not all exogenous peer effects are included.

The same result is achieved with the spatial panel model in (C). The only difference to the plm() is that spml() also models a peer effect on the disturbances.

spml( 
  achievement ~ alter_SI + alter_I,
  lag=F, # endogenous peer effect
  spatial.error="b", # peer effect on disturbances
  model="within", # change within observations is considered (i.e. individual FE)
  effect="individual", # individual FE (vs time FE)
  index = c("uid", "wave"),
  listw=mat2listw(w2.1, style="W"), # is assumed to be constant across time
  data=dat2.1
) # C Panel spatial autoregressive combined model (not run)

2.2 Exogenous peer effects and endogenous networks

In dat2.2, networks form endogenously. Note that the selection process is time-constant:

\(Pr(ij) = 1 / (1 + exp(-( 1 * similaritySI_{ij} )))\)

The DGP for the outcome is the same as before: \(achievement_{it} = 1SI_i + 1I_i + 1(\sum_{j}w_{ijt}SI_{jt})_{it} + 1\sum_{j}(w_{ijt}I_{jt})_{it}\)

lm(
  achievement ~ alter_SI + alter_I, 
  data=dat2.2
) # A: Pooled model

## 
## Call:
## lm(formula = achievement ~ alter_SI + alter_I, data = dat2.2)
## 
## Coefficients:
## (Intercept)     alter_SI      alter_I  
##    -0.07823      2.37146     -0.34580

Endogenous network formation amplifies the bias of the pooled model in (A).

plm(
  achievement ~ alter_SI + alter_I, 
  model="within", 
  index = c("uid", "wave"), 
  data=dat2.2
) # B: Fixed-effect panel model

## 
## Model Formula: achievement ~ alter_SI + alter_I
## 
## Coefficients:
## alter_SI  alter_I 
##        1        1

We observe in (B) that the panel model is not affected by a time-constant selection process. This is another important insight. If we can credibly argue that the selection process is time-constant, then the panel fixed-effect model will not only remove confounding from variables that influence the outcome but also the selection process!

plm(
  achievement ~ alter_SI, 
  model="within", 
  index = c("uid", "wave"), 
  data=dat2.2 # (! network formation is endogenous)
) # C: Fixed-effect panel model in which alter_I is omitted

## 
## Model Formula: achievement ~ alter_SI
## 
## Coefficients:
## alter_SI 
##   1.1261

We observe in (C), however, that if relevant exogenous peer effects are omitted, the remaining exogenous peer effects will be biased.

plm(
  achievement ~ alter_SI, 
  model="within", 
  index = c("uid", "wave"), 
  data=dat2.1 # (! network formation is exogenous)
) # D: Fixed-effect panel model in which alter_I is omitted but the network formation process is exogenous

## 
## Model Formula: achievement ~ alter_SI
## 
## Coefficients:
## alter_SI 
##   1.0061

As can be observed in (D), this is not the case if network formation is exogenous.

2.3. Endogenous peer effect and random networks

The DGP in dat2.3 is the following:

\(Pr(ij) = 0.5\)

\(achievement_{it} = 1SI_i + 1I_i + 0.5\sum_{j}(w_{ijt}y_{jt})_{it}\)

lm(
  achievement ~ alter_achievement + SI + I, 
  data=dat2.3
) # A: Pooled model with full DGP

## 
## Call:
## lm(formula = achievement ~ alter_achievement + SI + I, data = dat2.3)
## 
## Coefficients:
##       (Intercept)  alter_achievement                 SI                  I  
##        -4.166e-17          5.000e-01          1.000e+00          1.000e+00

lm(
  achievement ~ alter_achievement, 
  data=dat2.3
) # B: Pooled model with time-constant individual effects omitted

## 
## Call:
## lm(formula = achievement ~ alter_achievement, data = dat2.3)
## 
## Coefficients:
##       (Intercept)  alter_achievement  
##           0.05337           -1.32800

plm(
  achievement ~ alter_achievement, 
  model="within", 
  index = c("uid", "wave"), 
  data=dat2.3
) # C: Panel fixed-effect model

## 
## Model Formula: achievement ~ alter_achievement
## 
## Coefficients:
## alter_achievement 
##               0.5

We observe in (A), (B), and (C) that, just like with exogenous peer effects, the pooled model only recovers the correct effect estimate if the full DGP captured. As soon as time-constant unobservables are present, the estimates are incorrect. The panel fixed-effect model, however, estimates the correct endogenous peer effect if networks are random.

2.4. Endogenous peer effect and endogenous networks

The DPG in dat2.4 is the same as before with the exception that networks form endogenously in a time-constant selection process:

\(Pr(ij) = 1 / (1 + exp(-( 1 * similaritySI_{ij} )))\)

\(achievement_{it} = 1SI_i + 1I_i + 0.5\sum_{j}(w_{ijt}y_{jt})_{it} + 1(\sum_{j}w_{ijt}SI_{jt})_{it} + 1\sum_{j}(w_{ijt}I_{jt})_{it}\)

lm(
  achievement ~ alter_achievement + SI + I, 
  data=dat2.4
) # A: Pooled model with full DGP

## 
## Call:
## lm(formula = achievement ~ alter_achievement + SI + I, data = dat2.4)
## 
## Coefficients:
##       (Intercept)  alter_achievement                 SI                  I  
##         7.692e-17          5.000e-01          1.000e+00          1.000e+00

lm(
  achievement ~ alter_achievement, 
  data=dat2.4
) # B: Pooled model with time-constant individual effects omitted

## 
## Call:
## lm(formula = achievement ~ alter_achievement, data = dat2.4)
## 
## Coefficients:
##       (Intercept)  alter_achievement  
##           0.05541            1.33311

plm(
  achievement ~ alter_achievement, 
  model="within", 
  index = c("uid", "wave"), 
  data=dat2.4
) # C: Panel fixed-effect model

## 
## Model Formula: achievement ~ alter_achievement
## 
## Coefficients:
## alter_achievement 
##               0.5

We observe in (A), (B), and (C) the same pattern as before. The pooled model only recovers the correct results if the DGP is fully captured. The panel fixed effect model, however, estimates the correct endogenous peer effect even if the networks formed endogenously - as long as the selection process is time-constant.

3. Dynamic panel models

Not yet covered.

remotes::install_github("RozetaSimonovska/SDPDmod")

4. Conclusions

Let us draw some conclusions from this exercise:

When are individual effect (ego features) estimates biased?

The effect of individual features (\(\beta\)) will not be biased in the presence of peer effects if networks are random
Endogenous network formation does not bias the effect of individual features (\(\beta\)) in the presence of an endogenous peer effect
However, \(\beta\) will be biased if the individual feature is part of the selection process and influences peers (exogenous peer effect)
These results hold for cross-sectional and panel data using a RE estimator. The FE estimator cannot estimate time-constant individual effects

When are peer effect estimates biased?

In the cross-sectional setting:

If networks are random, exogenous peer effects are estimated correctly even if there are omitted variables
If networks are random, the endogenous peer effect is biased if there are omitted variables
Both exogenous and endogenous peer effects are biased if networks formed endogenously

In the panel setting:

If networks evolve over time, we can estimate time-constant exogenous peer effects using a FE estimator
If networks are random, the FE model recovers the correct exogenous peer effects even if other exogenous peer effects are omitted
The FE model also recovers correct exogenous peer effects in the presence of time-constant selection effects if all relevant exogenous peer effects are included in the model
The FE model recovers the correct endogenous peer effect as long as the selection process is time-constant (!)

Note that this simulation ignored important complications:

These results only hold for randomly distributed features. Individual features that are themselves affected by peer effects (i.e., \(X=WX\Theta\)) will be affected more by endogenous network formation.
We have not examined what changes if both exogenous and endogenous peer effects are present.