BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi all!

In SAS, is it possible to estimate FE or RE models using the instrumental variables approach?

I know that TSCSREG and MIXED procedures provide the means to estimate cross-sectional fixed or random effects and that SYSLIN provides the instrumental variable approach, but:

How do I do both? 🙂

Thanks!
1 REPLY 1
Dale
Pyrite | Level 9
You can estimate a structural equation model employing the NLMIXED procedure. The NLMIXED procedure also allows estimation of random effects.

Code for the structural equation model employing the NLMIXED procedure is not exactly straightforward, nor is the fit of a SEM for a fixed-effect only model using the NLMIXED procedure nearly as fast as the fit which is obtained employing the SYSLIN procedure. But the flexibility of the NLMIXED procedure to allow one to fit all kinds of models makes it a great tool to know.

The SYSLIN documentation provides an example of a supply and demand model. The quantity demanded is a function of the price of the product, the price of substitutes for the product, and the income of the purchaser. The quantity supplied is a function of the price of the product and the cost of producing the product. The quantity demanded and the quantity supplied are assumed to be in equilibrium.

Both quantity and product price are endogenous variables. The price of substitutes, income of the purchaser, and cost of producing the product are exogenous variables. We could fit a two-stage least squares model in which the product price is estimated as a function of exogenous variables and the predicted price is then substituted in the supply and demand function models. The NLMIXED code shown below is essentially a two-stage regression model in that there is a likelihood function for product price and a likelihood function for the supply and demand models. We maximize the two likelihood functions together. Note that the supply and demand model is a multivariate response, so the likelihood function for the supply and demand model is given as a multivariate normal density.

Supply and demand functions are conditionally independent. Conditional on the predicted price function, supply and demand are independent (the residual covariance when predicted price is included in the model is zero). This makes computing the inverse and determinant of the residual covariance matrix simple.

Below are the data and the invocation of the SYSLIN procedure employing a two-stage least squares estimator to fit the model.

data in;
  label q = "Quantity"
                p = "Price"
                s = "Price of Substitutes"
                y = "Income"
                u = "Unit Cost";
    drop i e1 e2;
    p = 0; q = 0;
    do i = 1 to 60;
          y = 1 + .05*i + .15*rannor(123);
          u = 2 + .05*rannor(123) + .05*rannor(123);
          s = 4 - .001*(i-10)*(i-110) + .5*rannor(123);
          e1 = .15 * rannor(123);
          e2 = .15 * rannor(123);
          demandx = 1 + .3 * y + .35 * s + e1;
          supplyx = -1 - 1 * u + e2 - .4*e1;
          q = 1.4/2.15 * demandx + .75/2.15 * supplyx;
          p = ( - q + supplyx ) / -1.4;
          output;
    end;
run;


proc syslin data=in 2sls first;
    endogenous p;
    instruments y u s;
    demand: model q = p y s;
    supply: model q = p u;
run;



Now, in contrast, here is the NLMIXED solution for the (fixed-effect) supply and demand model with endogenous price. (Note that I employ a utility macro to perform some matrix multiplication when maximizing the multivariate supply and demand likelihood function.)

%macro mat_mult(result=, left=, right=);
  /* This macro multiplies two matrices */
  /* and returns the result in a matrix */
  _n_nrowL = dim(&left,1);
  _n_ncolL = dim(&left,2);
  _n_nrowR = dim(&right,1);
  _n_ncolR = dim(&right,2);
  do _i_row=1 to _n_nrowL;
      do _i_col=1 to _n_ncolR;
          __sum=0;
          do _i_j=1 to _n_ncolL;
              __sum + &left[_i_row, _i_j] * &right[_i_j, _i_col];
          end;
          &result[_i_row, _i_col] = __sum;
      end;
  end;
%mend mat_mult;


proc nlmixed data=in tech=trureg;
  parms b0_s1 1 bY_s1 0.15 bU_s1 0.45 bS_s1 0.15
              b0_d 2 bP_d -1.2 bY_d 0.4 bS_d 0.4
              b0_s -0.5 bP_s 1.3 bU_s -1.1
              V_p V_qd V_qs 0.01;

  /* Likelihood function for endogenous price variable */
  phat = b0_s1 + bY_s1*y + bU_s1*u + bS_s1*s;
  ll_1 = -0.5*(log(2*constant('PI')) + log(V_p) +
                            ((p-phat)**2)/V_p);


  /* Likelihood function for supply and demand functions */

  /* Construct vector of expected values of the response */
  q_d = b0_d + bP_d*phat + bY_d*y + bS_d*s;
  q_s = b0_s + bP_s*phat + bU_s*u;
  array _vec_MU [2] q_d q_s;

  /* Construct vector of observed response values */
  _q = q; /* Same response for supply and demand functions */
  array _vec_Y [2] q _q;

  /* Vector of residuals and transposed residuals */
  array _vec_RESID [1,2] _temporary_;
  array _vec_RESIDTRANS [2,1] _temporary_;
  do _i_t=1 to 2;
      _vec_RESID[1,_i_t] = _vec_Y[_i_t] - _vec_MU[_i_t];
      _vec_RESIDTRANS[_i_t,1] = _vec_RESID[1,_i_t];
  end;

  /* Declare residual covariance matrix */
  array _mat_V [2,2] _temporary_;
  _mat_V[1,1] = V_qd;
  _mat_V[1,2] = 0; _mat_V[2,1]=0;
  _mat_V[2,2] = V_qs;


  /* Compute inverse(V) & det(V) */
  array _mat_VINV [2,2] _temporary_;
  det = 1;
  do i=1 to 2;
      do j=1 to 2;
          if i=j then do;
              _mat_Vinv[i,j] = 1 / _mat_V[i,j];
              det = det*_mat_V[i,j];
          end;
          else _mat_Vinv[i,j] = 0;
      end;
  end;

  /* Compute resid * Vinv * resid' */
  array _vec_RESIDxVINV [1,2] _temporary_;
  array _n_RESIDxVINVxRESIDTRANS [1,1] _temporary_;
  %mat_mult(result=_vec_RESIDxVINV,
                      left=_vec_RESID, right=_mat_VINV);
  %mat_mult(result=_n_RESIDxVINVxRESIDTRANS,
                      left=_vec_RESIDxVINV, right=_vec_RESIDTRANS);

  /* Supply and demand log-likelihood function */
  /* mu and antedependence parameters */
  dim = dim(_vec_y);
  ll_2 = -0.5*(dim*log(2*constant('PI')) + log(det) +
                                _n_RESIDxVINVxRESIDTRANS[1,1]);

  /* Joint log-likelihood functions is sum of the price */
  /* log-likelihood and the supply/demand log-likelihood. */
  ll = ll_1 + ll_2;

  /* Maximize the joint likelihood function */
  model ll ~ general(ll);
run;



You will observe that the NLMIXED code requires considerably more work to set up. But we can add random effects to the price, supply, and demand functions. Suppose that you had data on supply and demand in a number of markets and that you determined that a random intercept model for price, supply, and demand functions was needed. The NLMIXED code below modifies the previous code by adding these random effects to the expectation functions and by adding a RANDOM statement at the end.


proc nlmixed data=in tech=trureg;
  parms b0_s1 1 bY_s1 0.15 bU_s1 0.45 bS_s1 0.15
              b0_d 2 bP_d -1.2 bY_d 0.4 bS_d 0.4
              b0_s -0.5 bP_s 1.3 bU_s -1.1
              V_p V_qd V_qs 0.01;

  /* Likelihood function for endogenous price variable */
  phat = b0_s1 + bY_s1*y + bU_s1*u + bS_s1*s + gamma_price;
  ll_1 = -0.5*(log(2*constant('PI')) + log(V_p) +
                            ((p-phat)**2)/V_p);


  /* Likelihood function for supply and demand functions */

  /* Construct vector of expected values of the response */
  q_d = b0_d + bP_d*phat + bY_d*y + bS_d*s + gamma_demand;
  q_s = b0_s + bP_s*phat + bU_s*u + gamma_supply;
  array _vec_MU [2] q_d q_s;

  /* Construct vector of observed response values */
  _q = q; /* Same response for supply and demand functions */
  array _vec_Y [2] q _q;

  /* Vector of residuals and transposed residuals */
  array _vec_RESID [1,2] _temporary_;
  array _vec_RESIDTRANS [2,1] _temporary_;
  do _i_t=1 to 2;
      _vec_RESID[1,_i_t] = _vec_Y[_i_t] - _vec_MU[_i_t];
      _vec_RESIDTRANS[_i_t,1] = _vec_RESID[1,_i_t];
  end;

  /* Declare residual covariance matrix */
  array _mat_V [2,2] _temporary_;
  _mat_V[1,1] = V_qd;
  _mat_V[1,2] = 0; _mat_V[2,1]=0;
  _mat_V[2,2] = V_qs;


  /* Compute inverse(V) & det(V) */
  array _mat_VINV [2,2] _temporary_;
  det = 1;
  do i=1 to 2;
      do j=1 to 2;
          if i=j then do;
              _mat_Vinv[i,j] = 1 / _mat_V[i,j];
              det = det*_mat_V[i,j];
          end;
          else _mat_Vinv[i,j] = 0;
      end;
  end;

  /* Compute resid * Vinv * resid' */
  array _vec_RESIDxVINV [1,2] _temporary_;
  array _n_RESIDxVINVxRESIDTRANS [1,1] _temporary_;
  %mat_mult(result=_vec_RESIDxVINV,
                      left=_vec_RESID, right=_mat_VINV);
  %mat_mult(result=_n_RESIDxVINVxRESIDTRANS,
                      left=_vec_RESIDxVINV, right=_vec_RESIDTRANS);

  /* Supply and demand log-likelihood function */
  /* mu and antedependence parameters */
  dim = dim(_vec_y);
  ll_2 = -0.5*(dim*log(2*constant('PI')) + log(det) +
                                _n_RESIDxVINVxRESIDTRANS[1,1]);

  /* Joint log-likelihood functions is sum of the price */
  /* log-likelihood and the supply/demand log-likelihood. */
  ll = ll_1 + ll_2;

  /* Maximize the joint likelihood function */
  model ll ~ general(ll);

random gamma_price gamma_demand gamma_supply
           ~ normal([0,0,0],
                        [V_gamma_price, 0, 0,
                          V_gamma_demand, 0,
                          V_gamma_supply]) subject=market;
run;


Let me mention that I am not terribly familiar with structural equation models and methods. The SYSLIN procedure has two maximum likelihood estimation methods (FIML and LIML). The model shown above is neither of these. As stated near the top of this post, the model shown here is a maximum likelihood model which fits a two-stage model. Presumably, one could estimate the FIML or LIML model employing the NLMIXED procedure. I just don't know the details of how the likelihood is constructed for FIML or LIML to implement them in NLMIXED. But the following code demonstrates that the two-stage likelihood model shown does estimate the parameters of the (fixed-effect) structural equation model with essentially the same characteristics of the estimators as methods implemented in SYSLIN.



data in;
  label q = "Quantity"
                p = "Price"
                s = "Price of Substitutes"
                y = "Income"
                u = "Unit Cost";
    drop i e1 e2;
    p = 0; q = 0;
    do trial=1 to 100;
        do i = 1 to 500;
              y = 1 + .05*i + .15*rannor(123);
              u = 2 + .05*rannor(123) + .05*rannor(123);
              s = 4 - .001*(i-10)*(i-110) + .5*rannor(123);
              e1 = .15 * rannor(123);
              e2 = .15 * rannor(123);
              demandx = 1 + .3 * y + .35 * s + e1;
              supplyx = -1 - 1 * u + e2 - .4*e1;
              q = 1.4/2.15 * demandx + .75/2.15 * supplyx;
              p = ( - q + supplyx ) / -1.4;
              output;
        end;
    end;
run;


ods listing close;
ods output parameterEstimates=Syslin1_parms;
proc syslin data=in 2sls first;
    by trial;
    endogenous p;
    instruments y u s;
    demand: model q = p y s;
    supply: model q = p u;
run;


ods output parameterEstimates=Syslin2_parms;
proc syslin data=in fiml;
    by trial;
    endogenous p;
    instruments y u s;
    demand: model q = p y s;
    supply: model q = p u;
run;


ods output parameterEstimates=Syslin3_parms;
proc syslin data=in liml;
    by trial;
    endogenous p;
    instruments y u s;
    demand: model q = p y s;
    supply: model q = p u;
run;


ods output parameterEstimates=NLMIXED_parms;
proc nlmixed data=in tech=trureg;
  by trial;
  parms b0_s1 1 bY_s1 0.15 bU_s1 0.45 bS_s1 0.15
              b0_d 2 bP_d -1.2 bY_d 0.4 bS_d 0.4
              b0_s -0.5 bP_s 1.3 bU_s -1.1
              V_p V_qd V_qs 0.01;

  /* Likelihood function for endogenous price variable */
  phat = b0_s1 + bY_s1*y + bU_s1*u + bS_s1*s;
  ll_1 = -0.5*(log(2*constant('PI')) + log(V_p) +
                            ((p-phat)**2)/V_p);


  /* Likelihood function for supply and demand functions */

  /* Construct vector of expected values of the response */
  q_d = b0_d + bP_d*phat + bY_d*y + bS_d*s;
  q_s = b0_s + bP_s*phat + bU_s*u;
  array _vec_MU [2] q_d q_s;

  /* Construct vector of observed response values */
  _q = q; /* Same response for supply and demand functions */
  array _vec_Y [2] q _q;

  /* Vector of residuals and transposed residuals */
  array _vec_RESID [1,2] _temporary_;
  array _vec_RESIDTRANS [2,1] _temporary_;
  do _i_t=1 to 2;
      _vec_RESID[1,_i_t] = _vec_Y[_i_t] - _vec_MU[_i_t];
      _vec_RESIDTRANS[_i_t,1] = _vec_RESID[1,_i_t];
  end;

  /* Declare residual covariance matrix */
  array _mat_V [2,2] _temporary_;
  _mat_V[1,1] = V_qd;
  _mat_V[1,2] = 0; _mat_V[2,1]=0;
  _mat_V[2,2] = V_qs;


  /* Compute inverse(V) & det(V) */
  array _mat_VINV [2,2] _temporary_;
  det = 1;
  do i=1 to 2;
      do j=1 to 2;
          if i=j then do;
              _mat_Vinv[i,j] = 1 / _mat_V[i,j];
              det = det*_mat_V[i,j];
          end;
          else _mat_Vinv[i,j] = 0;
      end;
  end;

  /* Compute resid * Vinv * resid' */
  array _vec_RESIDxVINV [1,2] _temporary_;
  array _n_RESIDxVINVxRESIDTRANS [1,1] _temporary_;
  %mat_mult(result=_vec_RESIDxVINV,
                      left=_vec_RESID, right=_mat_VINV);
  %mat_mult(result=_n_RESIDxVINVxRESIDTRANS,
                      left=_vec_RESIDxVINV, right=_vec_RESIDTRANS);

  /* Supply and demand log-likelihood function */
  /* mu and antedependence parameters */
  dim = dim(_vec_y);
  ll_2 = -0.5*(dim*log(2*constant('PI')) + log(det) +
                                _n_RESIDxVINVxRESIDTRANS[1,1]);

  /* Joint log-likelihood functions is sum of the price */
  /* log-likelihood and the supply/demand log-likelihood. */
  ll = ll_1 + ll_2;

  /* Maximize the joint likelihood function */
  model ll ~ general(ll);
run;





data syslin1_parms;
  set syslin1_parms;
  if mod(_n_,15)=1 then parameter="b0_s1"; else
  if mod(_n_,15)=2 then parameter="bY_s1"; else
  if mod(_n_,15)=3 then parameter="bU_s1"; else
  if mod(_n_,15)=4 then parameter="bS_s1"; else

  if mod(_n_,15)=9 then parameter="b0_d"; else
  if mod(_n_,15)=10 then parameter="bP_d"; else
  if mod(_n_,15)=11 then parameter="bY_d"; else
  if mod(_n_,15)=12 then parameter="bS_d"; else

  if mod(_n_,15)=13 then parameter="b0_s"; else
  if mod(_n_,15)=14 then parameter="bP_s"; else
  if mod(_n_,15)=0 then parameter="bU_s";
run;


data syslin2_parms;
  set syslin2_parms;
  if mod(_n_,7)=1 then parameter="b0_d"; else
  if mod(_n_,7)=2 then parameter="bP_d"; else
  if mod(_n_,7)=3 then parameter="bY_d"; else
  if mod(_n_,7)=4 then parameter="bS_d"; else

  if mod(_n_,7)=5 then parameter="b0_s"; else
  if mod(_n_,7)=6 then parameter="bP_s"; else
  if mod(_n_,7)=0 then parameter="bU_s";
run;


data syslin3_parms;
  set syslin3_parms;
  if mod(_n_,7)=1 then parameter="b0_d"; else
  if mod(_n_,7)=2 then parameter="bP_d"; else
  if mod(_n_,7)=3 then parameter="bY_d"; else
  if mod(_n_,7)=4 then parameter="bS_d"; else

  if mod(_n_,7)=5 then parameter="b0_s"; else
  if mod(_n_,7)=6 then parameter="bP_s"; else
  if mod(_n_,7)=0 then parameter="bU_s";
run;



ods listing;
title "Syslin 2sls - structural equation parameters";
proc means data=syslin1_parms(where=(parameter not in ("b0_s1","bS_s1","bU_s1","bY_s1")));
  class parameter;
  var estimate stderr;
run;


title "Syslin FIML - structural equation parameters";
proc means data=syslin2_parms;
  class parameter;
  var estimate stderr;
run;


title "Syslin LIML - structural equation parameters";
proc means data=syslin3_parms;
  class parameter;
  var estimate stderr;
run;


title "NLMIXED - structural equation parameters";
proc means data=NLMIXED_parms(where=(parameter not in ("V_p","V_qd","V_qs","b0_s1","bS_s1","bU_s1","bY_s1")));
  class parameter;
  var estimate standarderror;
run;




title "Syslin 2sls - stage 1 parameters";
proc means data=syslin1_parms(where=(parameter in ("b0_s1","bS_s1","bU_s1","bY_s1")));
  class parameter;
  var estimate stderr;
run;


title "NLMIXED - stage 1 parameters";
proc means data=NLMIXED_parms(where=(parameter in ("b0_s1","bS_s1","bU_s1","bY_s1")));
  class parameter;
  var estimate standarderror;
run;


That is probably enough to chew on for one post. I hope that this gives you sufficient information to complete your own analysis.

Oh, one more comment. I used TECH=TRUREG to optimize the likelihood function in the NLMIXED code. That is not the default method. The default method is faster, but the default method failed and crashed the NLMIXED procedure at the 22nd iteration of the simulation. You can always try the default method for optimizing the likelihood. It seems to work well most of the time. However, if you get into some difficulty, then you might want to use an optimization technique like TRUREG. The default method does not employ second derivatives of the parameters in the optimization algorithm. TRUREG and a couple of other methods do utilize second derivatives. Optimization algorithms which use second derivatives generally require more time to execute, but may have superior properties in many ways.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2708 views
  • 0 likes
  • 2 in conversation