turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Problems when Absorbing two variables using Proc g...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-10-2017 09:23 PM

Hi all,

I am running fixed effects model that regress outcome on A while controlling for both student FE (studentid) and course FE (courseid). I used Proc glm. My understanding is that i can absorb both studentid and courseid.

In my first model, I only absorbed one variable:

Proc glm data=data;

Class courseid;

Absorb studentid;

Model outcome=A courseid;

Run;

I had no problems running the model and the coefficients on A look correct. However when I add both course ID and student ID in the absorb function, sas fails to provide a valid coefficients for A. Shouldn't the two models-year-old exactly the same results?

Any insights are appreciated!

I am running fixed effects model that regress outcome on A while controlling for both student FE (studentid) and course FE (courseid). I used Proc glm. My understanding is that i can absorb both studentid and courseid.

In my first model, I only absorbed one variable:

Proc glm data=data;

Class courseid;

Absorb studentid;

Model outcome=A courseid;

Run;

I had no problems running the model and the coefficients on A look correct. However when I add both course ID and student ID in the absorb function, sas fails to provide a valid coefficients for A. Shouldn't the two models-year-old exactly the same results?

Any insights are appreciated!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xudeer

08-10-2017 09:46 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xudeer

08-11-2017 04:17 AM

Agree, it should give same estimates for A. Unless courseID is nested in A, because you then in practice also have absorbed A when you absorb courseID.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JacobSimonsen

08-14-2017 12:59 PM

Thank you Jacob!

That is why I got confused. My courseID is not nested in A. A is actually teacher ID. The majority of courses are taught by multiple college instructors and each instructor is teaching multiple courses as well. The most confusing thing is that when I only absorb student ID and add courseID and my key variabe A (instructorID) as dummy variables, my model is totally fine:

SAS Output

Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total

82000 | 32715.80481 | 0.39897 | 2.39 | <.0001 |

241302 | 40251.64007 | 0.16681 | ||

323302 | 72967.44488 |

R-Square Coeff Var Root MSE second_2yr Mean

0.448362 | 118.6951 | 0.408424 | 0.344095 |

However, once I abosorb both studentID and courseID. the model explodes, incating that something is wrong:

SAS Output

Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total

323250 | 72954.59899 | 0.22569 | 0.91 | 0.7029 |

52 | 12.84589 | 0.24704 | ||

323302 | 72967.44488 |

R-Square Coeff Var Root MSE second_2yr Mean

0.999824 | 144.4448 | 0.497028 | 0.344095 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xudeer

08-14-2017 01:28 PM

You have only showed us the code for one of the models. Your description seems to indicate a fairly straightforward change to the code for the second model, but it would still be nice if you showed it to us.

My other concern is that you have 82000 df for the model in the first output, this doesn't seem to be a likely number, it seems way too large for any type of teacher/testing scenario I am aware of. Also, the total degrees of freedom, over 300,000, is also way too large for any type of teacher/testing scenario. Can you explain why these numbers are so large?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

08-14-2017 08:47 PM

Thank you for the response, PaigeMiller!

Here is my first model that only absorbs studentID:

proc glm data=derived.fouryear;

absorb student_nid;

class instructor_nid coursenid_ft;

model second_2yr=instructor_nid coursenid_ft / solution;

run;

Here are the outputs:

SAS Output

Dependent Variable: second_2yr

Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total

81979 | 31583.90047 | 0.38527 | 2.25 | <.0001 |

241323 | 41383.54441 | 0.17149 | ||

323302 | 72967.44488 |

R-Square Coeff Var Root MSE second_2yr Mean

0.432849 | 120.3472 | 0.414109 | 0.344095 |

Source DF Type I SS Mean Square F Value Pr > F student_nid instructor_nid coursenid_ft

68606 | 17514.98225 | 0.25530 | 1.49 | <.0001 |

7867 | 10393.42479 | 1.32114 | 7.70 | <.0001 |

5506 | 3675.49343 | 0.66754 | 3.89 | <.0001 |

In the second model, everything remains the same, except that now I absorb courseID rather than having it as dummies:

proc sort data=derived.fouryear; by student_nid coursenid_ft;run;

proc glm data=derived.fouryear;

absorb student_nid coursenid_ft;

class instructor_nid;

model second_2yr=instructor_nid/ solution;

run;

Here are the outputs from the second model:

SAS Output

Dependent Variable: second_2yr

Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total

323229 | 72948.94614 | 0.22569 | 0.89 | 0.7788 |

73 | 18.49874 | 0.25341 | ||

323302 | 72967.44488 |

R-Square Coeff Var Root MSE second_2yr Mean

0.999746 | 146.2955 | 0.503396 | 0.344095 |

Source DF Type I SS Mean Square F Value Pr > F student_nid coursenid_(IN ABOVE) instructor_nid

68606 | 17514.98225 | 0.25530 | 1.01 | 0.5041 |

254558 | 55420.96262 | 0.21771 | 0.86 | 0.8402 |

65 | 13.00126 | 0.20002 | 0.79 | 0.8338 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xudeer

08-15-2017 07:02 AM

You haven't addressed why there are so many degrees of freedom, this seems like an incredibly large number.

However, from the ABSORB documentation

Several variables can be specified, in which case each one is assumed to be nested in the preceding variable in the ABSORB statement.

So your two models are not equivalent.

Also, from the documentation

When you use the ABSORB statement, the data set (or each BY group, if a BY statement appears) must be sorted by the variables in the ABSORB statement.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

08-15-2017 06:33 PM

Thank you PaigeMiller!

You haven't addressed why there are so many degrees of freedom, this seems like an incredibly large number.

- We have more than 300,000 observations (the data is student by course level transcript records from multiple cohorts of students from an entire four-year public college system)

However, from the ABSORB documentation

Several variables can be specified, in which case each one is assumed to be nested in the preceding variable in the ABSORB statement.

So your two models are not equivalent.

- I see. What I would like to do is to absorb studentID and courseID which are not nested within each other. Is there any model that SAS would allow that?

Also, from the documentation

When you use the ABSORB statement, the data set (or each BY group, if a BY statement appears) must be sorted by the variables in the ABSORB statement.

-- Yes, I sorted the variable before running the command.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xudeer

08-16-2017 04:46 AM

I dont think there is any procedure doing what you want. But, theoretically it is possible to "absorb" non-nested variables. As you maybe know, when using the absorb method, data and collumnvectors of the design matrix is projected into the orthogonal space of the design vectors defined by the variable(s) in the absorbstatement. This is quite simple if you have only one class variable in the absorb statement. If there are more variables (non-nested), then this projection becomes more complicated (in terms of calculation time). I experimented with this some years ago, and I didnt see any time efficient way to do it. So that is maybe the reason that it is also not possible with proc glm.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JacobSimonsen

08-16-2017 01:18 PM

Thank you Jacob! For your information, STATA can absorb multiple non-nested variables, but it runs extremely slow for a large dataset such as mine.

I guess for SAS, I will have to only absorb only one variable while adding the other as dummies?