02-21-2017 05:58 PM
I am trying to estimate a VAR model with a panel data set. My data consists of buyer-seller dyads observed over time. For example, consider a buyer-seller dyad (i,j) with buyer i and seller j. I observe Bi,t from the buyer and Sj,t from the seller for t = 1, 2, ..., T. These two variables are simultaneously determined and hence are endogeneous. For this particular dyad, I can specify a VAR model as follows:
02-28-2017 12:52 PM
I am not particularly familiar with models where
contemporaneous response values appear on the RHS. In any event,
I am going to suggest an alternate model that might capture many
aspects of your problem description while still remaining parsimonious.
Suppose (i,j)th buyer/seller pair denotes a panel.
B_it = mu_bi + mu_t + X_t beta_1 + e_it
S_jt = mu_sj + mu_t + W_t beta_2 + e_jt
1. B_it and S_jt denote the i-th buyer and the j-th seller response
values at time t.
2. mu_bi and mu_sj denote the intercept terms
for i-th buyer and j-th seller (fixed effects),
3. mu_t is a bivariate time trend such as random walk (taken
to be a random effect term) that is common to all the panels,
4. X_t and W_t are regression variables (there might be overlap but their
coefficients will be different for B and S),
5. e_it and e_jt are independent white noise terms.
This model could be thought of as a bivariate version of panel model
that has fixed panel effects and random time effects.
I am providing sample code for fitting this model with the SSM procedure.
See the SSM procedure in SAS/ETS documentation for more information about this procedure.
For simplicity of descripton suppose that:
there are 100 buyers and 50 sellers,
X variables are X1 to X5
W variables are W1 to W8
Suppose the input data set, say test, has the following:
1. the observations are indexed by a time index (say date) and test is sorted
by the time index.
2. the time index are equispaced (such as monthly, daily, etc).
3. there can be multiple observations, say n_t, at a time index t. n_t need not be
the same for all t (i.e., panels can be unbalanced)
4. the input data set already has the necessary intercept dummies:
mu_b1 to mu_b100 are the buyer dummies: mu_bi = (buyer = i);
mu_s1 to mu_b50 are the seller dummies: mu_sj = (seller = j);
proc ssm data=test;
id date < interval=day >;
state timeEffect(2) t(I) cov(g) cov1(d);
comp bTime = timeEffect;
comp sTime = timeEffect;
model B = mu_b1-mu_b100 x1-x5 bTime wb;
model S = mu_s1-mu_s50 w1-w8 sTime ws;
output out=for press;
One can consider many other variations of this model, including your model with lagged response
values. However, PROC SSM is not very scalable for such models when
there are so many (thousands according to you) panels.
02-28-2017 03:44 PM
Thanks for your response. However, the model you suggest does not address the simultaneity in my data set. B depends on S, and S depends on B. But I still appreciate your response.
02-28-2017 04:11 PM
I understand. Models that include terms signifying association between the i-th buyer and j-th seller for each (i,j) pair become quite large. At the moment SSM will not scale for such problems. If the number of such pairs is small, say less than 50, then it might be feasible.