BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DavidJ
Calcite | Level 5

I am getting the following notes using PROC REG in my output:

Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.
The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.

other =Intercept - lung - heart - esrd
female =Intercept - male
other_race =Intercept - white_race - black_race - hisp_race

The three groups of variables as part of the model are being used as flags, where one and only one of the variables can have a value of 1, and the others 0

(example:If Male is 1, Female is 0 and vice versa).  Other variables listed in the model statement are non-binary.

Code looks as follows:

proc reg data=data alpha=.05;

   model age=risk_score living lung heart esrd other count_visits male female white_race black_race hisp_race other_race median_income pop_density pct_rental;

   plot predicted.*residual. / name=Graph1;

quit;

run;

I am not a statistician, so I can't really explain to my client what is happening.  Why is SAS setting the "other" values to 0?  I am getting the same behavior (same MODEL) using

PROC AUTOREG.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You've overparameterized your model. Basically, if you have a categorical variable such as male/female you only include one in the model not both otherwise the second will get set to 0. Same issue with race.

You can check any intro regression text on how to code categorical variables or see here:

http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter5/sasreg5.htm

View solution in original post

8 REPLIES 8
Reeza
Super User

You've overparameterized your model. Basically, if you have a categorical variable such as male/female you only include one in the model not both otherwise the second will get set to 0. Same issue with race.

You can check any intro regression text on how to code categorical variables or see here:

http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter5/sasreg5.htm

data_null__
Jade | Level 19

Do those variables really need to be removed?  I don't think so.  This is parameterized the same way GLM would if GENDER and RACE were CLASS variables.

Reeza
Super User

I mean if you have male and female, then you include only one of male or female, not both male and female, ie if you have a binary variable called sex where 0=Male, 1=Female that would be the same.

The errors above are SAS correcting for this, and the estimates that are left in appear correct, but I like to specify the coding in case the defaults aren't what I'd want to see.  And to avoid error messages.

data_null__
Jade | Level 19

I don't see any ERROR messages? 

Reeza
Super User

Sorry, NOTES not ERRORS in the log. The docs do say the defaults are an overparameterized model as well.

http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_glm_sect030....

"There are more columns for these effects than there are degrees of freedom for them; in other words, PROC GLM is using an over-parameterized model."

Whether or not you should remove them is a matter of opinion I suppose.

abhik_giri
Calcite | Level 5

Hi,

I am getting the same error message:

Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.

The structure of my model is like this:

sales=f(price,qtr_id)

qtr_id stretches from t4 to t11 (they are binary values, with only one of them equal to 1, rest 0 for a row - like identity matrix).

However, it says that t11 is a linear combination of intercept and t4-t10.

Any reason why this is happening?

statguy22
Calcite | Level 5

It is the same problem as above.  If your data runs from t4 to t11, you must leave out one of the periods.  Otherwise, you have overparameterized your model.  Simply, leave out the qtr_id for, say, t4 and SAS will estimate your model correctly.

DavidJ
Calcite | Level 5

I have been using SAS for close to 18 years but have just begun dabbling in SAS/STAT.  I'll take a look at the webpage

you suggested.  Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 16709 views
  • 0 likes
  • 5 in conversation