I am getting the following notes using PROC REG in my output:
Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased. |
The following parameters have been set to 0, since the variables are a linear combination of other variables as shown. |
other = | Intercept - lung - heart - esrd |
female = | Intercept - male |
other_race = | Intercept - white_race - black_race - hisp_race |
The three groups of variables as part of the model are being used as flags, where one and only one of the variables can have a value of 1, and the others 0
(example:If Male is 1, Female is 0 and vice versa). Other variables listed in the model statement are non-binary.
Code looks as follows:
proc reg data=data alpha=.05;
model age=risk_score living lung heart esrd other count_visits male female white_race black_race hisp_race other_race median_income pop_density pct_rental;
plot predicted.*residual. / name=Graph1;
quit;
run;
I am not a statistician, so I can't really explain to my client what is happening. Why is SAS setting the "other" values to 0? I am getting the same behavior (same MODEL) using
PROC AUTOREG.
You've overparameterized your model. Basically, if you have a categorical variable such as male/female you only include one in the model not both otherwise the second will get set to 0. Same issue with race.
You can check any intro regression text on how to code categorical variables or see here:
http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter5/sasreg5.htm
You've overparameterized your model. Basically, if you have a categorical variable such as male/female you only include one in the model not both otherwise the second will get set to 0. Same issue with race.
You can check any intro regression text on how to code categorical variables or see here:
http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter5/sasreg5.htm
Do those variables really need to be removed? I don't think so. This is parameterized the same way GLM would if GENDER and RACE were CLASS variables.
I mean if you have male and female, then you include only one of male or female, not both male and female, ie if you have a binary variable called sex where 0=Male, 1=Female that would be the same.
The errors above are SAS correcting for this, and the estimates that are left in appear correct, but I like to specify the coding in case the defaults aren't what I'd want to see. And to avoid error messages.
I don't see any ERROR messages?
Sorry, NOTES not ERRORS in the log. The docs do say the defaults are an overparameterized model as well.
"There are more columns for these effects than there are degrees of freedom for them; in other words, PROC GLM is using an over-parameterized model."
Whether or not you should remove them is a matter of opinion I suppose.
Hi,
I am getting the same error message:
Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.
The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.
The structure of my model is like this:
sales=f(price,qtr_id)
qtr_id stretches from t4 to t11 (they are binary values, with only one of them equal to 1, rest 0 for a row - like identity matrix).
However, it says that t11 is a linear combination of intercept and t4-t10.
Any reason why this is happening?
It is the same problem as above. If your data runs from t4 to t11, you must leave out one of the periods. Otherwise, you have overparameterized your model. Simply, leave out the qtr_id for, say, t4 and SAS will estimate your model correctly.
I have been using SAS for close to 18 years but have just begun dabbling in SAS/STAT. I'll take a look at the webpage
you suggested. Thanks!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.