## Problems when Absorbing two variables using Proc glm

Occasional Contributor
Posts: 6

# Problems when Absorbing two variables using Proc glm

Hi all,

I am running fixed effects model that regress outcome on A while controlling for both student FE (studentid) and course FE (courseid). I used Proc glm. My understanding is that i can absorb both studentid and courseid.
In my first model, I only absorbed one variable:
Proc glm data=data;
Class courseid;
Absorb studentid;
Model outcome=A courseid;
Run;

I had no problems running the model and the coefficients on A look correct. However when I add both course ID and student ID in the absorb function, sas fails to provide a valid coefficients for A. Shouldn't the two models-year-old exactly the same results?
Any insights are appreciated!
Occasional Contributor
Posts: 6

## Re: Problems when Absorbing two variables using Proc glm

Super Contributor
Posts: 298

## Re: Problems when Absorbing two variables using Proc glm

Agree, it should give same estimates for A. Unless courseID is nested in A, because you then in practice also have absorbed A when you absorb courseID.

Occasional Contributor
Posts: 6

## Re: Problems when Absorbing two variables using Proc glm

Posted in reply to JacobSimonsen

Thank you Jacob!

That is why I got confused. My courseID is not nested in A. A is actually teacher ID. The majority of courses are taught by multiple college instructors and each instructor is teaching multiple courses as well. The most confusing thing is that when I only absorb student ID and add courseID and my key variabe A (instructorID) as dummy variables, my model is totally fine:

SAS Output

Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total
 82000 32715.8 0.39897 2.39 <.0001 241302 40251.6 0.16681 323302 72967.4

R-Square Coeff Var Root MSE second_2yr Mean
 0.448362 118.695 0.408424 0.344095

However, once I abosorb both studentID and courseID. the model explodes, incating that something is wrong:

SAS Output

Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total
 323250 72954.6 0.22569 0.91 0.7029 52 12.8459 0.24704 323302 72967.4

R-Square Coeff Var Root MSE second_2yr Mean
 0.999824 144.445 0.497028 0.344095

Trusted Advisor
Posts: 1,915

## Re: Problems when Absorbing two variables using Proc glm

You have only showed us the code for one of the models. Your description seems to indicate a fairly straightforward change to the code for the second model, but it would still be nice if you showed it to us.

My other concern is that you have 82000 df for the model in the first output, this doesn't seem to be a likely number, it seems way too large for any type of teacher/testing scenario I am aware of. Also, the total degrees of freedom, over 300,000, is also way too large for any type of teacher/testing scenario. Can you explain why these numbers are so large?

Occasional Contributor
Posts: 6

## Re: Problems when Absorbing two variables using Proc glm

Posted in reply to PaigeMiller

Thank you for the response, PaigeMiller!

Here is my first model that only absorbs studentID:

proc glm data=derived.fouryear;
absorb student_nid;
class instructor_nid coursenid_ft;
model second_2yr=instructor_nid coursenid_ft / solution;
run;

Here are the outputs:

SAS Output

Dependent Variable: second_2yr

Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total
 81979 31583.9 0.38527 2.25 <.0001 241323 41383.5 0.17149 323302 72967.4

R-Square Coeff Var Root MSE second_2yr Mean
 0.432849 120.347 0.414109 0.344095

Source DF Type I SS Mean Square F Value Pr > F student_nid instructor_nid coursenid_ft
 68606 17515 0.2553 1.49 <.0001 7867 10393.4 1.32114 7.7 <.0001 5506 3675.49 0.66754 3.89 <.0001

In the second model, everything remains the same, except that now I absorb courseID rather than having it as dummies:

proc sort data=derived.fouryear; by student_nid coursenid_ft;run;
proc glm data=derived.fouryear;
absorb student_nid coursenid_ft;
class instructor_nid;
model second_2yr=instructor_nid/ solution;
run;

Here are the outputs from the second model:

SAS Output

Dependent Variable: second_2yr

Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total
 323229 72948.9 0.22569 0.89 0.7788 73 18.4987 0.25341 323302 72967.4

R-Square Coeff Var Root MSE second_2yr Mean
 0.999746 146.296 0.503396 0.344095

Source DF Type I SS Mean Square F Value Pr > F student_nid coursenid_(IN ABOVE) instructor_nid
 68606 17515 0.2553 1.01 0.5041 254558 55421 0.21771 0.86 0.8402 65 13.0013 0.20002 0.79 0.8338

Trusted Advisor
Posts: 1,915

## Re: Problems when Absorbing two variables using Proc glm

You haven't addressed why there are so many degrees of freedom, this seems like an incredibly large number.

However, from the ABSORB documentation

Several variables can be specified, in which case each one is assumed to be nested in the preceding variable in the ABSORB statement.

So your two models are not equivalent.

Also, from the documentation

When you use the ABSORB statement, the data set (or each BY group, if a BY statement appears) must be sorted by the variables in the ABSORB statement.

Occasional Contributor
Posts: 6

## Re: Problems when Absorbing two variables using Proc glm

Posted in reply to PaigeMiller

Thank you PaigeMiller!

You haven't addressed why there are so many degrees of freedom, this seems like an incredibly large number.

- We have more than 300,000 observations (the data is student by course level transcript records from multiple cohorts of students from an entire four-year public college system)

However, from the ABSORB documentation

Several variables can be specified, in which case each one is assumed to be nested in the preceding variable in the ABSORB statement.

So your two models are not equivalent.

- I see. What I would like to do is to absorb studentID and courseID which are not nested within each other. Is there any model that SAS would allow that?

Also, from the documentation

When you use the ABSORB statement, the data set (or each BY group, if a BY statement appears) must be sorted by the variables in the ABSORB statement.

-- Yes, I sorted the variable before running the command.

Super Contributor
Posts: 298

## Re: Problems when Absorbing two variables using Proc glm

I dont think there is any procedure doing what you want. But, theoretically it is possible to "absorb" non-nested variables. As you maybe know, when using the absorb method, data and collumnvectors of the design matrix is projected into the orthogonal space of the design vectors defined by the variable(s) in the absorbstatement. This is quite simple if you have only one class variable in the absorb statement. If there are more variables (non-nested), then this projection becomes more complicated (in terms of calculation time). I experimented with this some years ago, and I didnt see any time efficient way to do it. So that is maybe the reason that it is also not possible with proc glm.

Occasional Contributor
Posts: 6

## Re: Problems when Absorbing two variables using Proc glm

Posted in reply to JacobSimonsen

Thank you Jacob! For your information, STATA can absorb multiple non-nested variables, but it runs extremely slow for a large dataset such as mine.

I guess for SAS, I will have to only absorb only one variable while adding the other as dummies?

Discussion stats
• 9 replies
• 225 views
• 0 likes
• 3 in conversation