BookmarkSubscribeRSS Feed
Darren_Pham
Calcite | Level 5
Hi,
I'm trying to do an model with categorical variables. I have 4 categorical variables in my estimation dataset, run PROC GLM and get the model.
Now I want to apply that model into a much bigger dataset. I couldn't do it with proc score because the veriables are categorical. And those variables have about 40 discrete values in each, so make dummy variables may be painful. Any ideas on how should I do it?

My code:
proc GLM data=DataIn outstat=RegOut;
class A B C;
model ModelOut = A B C B*C/ solution;
output out=out p=yhat;
run;
quit;

What I wanna do is similar to this one (I couldn't do it because the variables are categorical)
proc
score data=DataTest score=RegOut out=DataOut;
var
A B C B*C;
run
;
Any suggestions are highly appreciated.
3 REPLIES 3
Doc_Duke
Rhodochrosite | Level 12
Darren,

Although you could probably do this with TRANSPOSE and some matrix multiplication, I wonder if you need to re-think your question. The model you described, with just 3 predictor variables and one interaction, requires about 1700 degrees of freedom. Unless your DataIn dataset is very rich and has several hundred thousand observations, you will not be able to get a model that validates internally, let alone provides reasonable scoring.

Doc Muhlbaier
Duke
plf515
Lapis Lazuli | Level 10
In addition to Doc's excellent point .... even if your sample is huge (say 5 million subjects) I'd wonder how you will interpret the results of an interaction of two categorical variables with 40 levels each. And I'd worry about how to tell which are real indicators of a population difference and which are random chance, unless you have strong a priori hypotheses.

Could you tell us a bit about what you are studying?
StatDave
SAS Super FREQ
You can use PROC LOGISTIC (or PROC GLMMOD) to create the dummy variables for you as discussed in this usage note: http://support.sas.com/kb/23217 . The answers to many questions can be found in the Samples and SAS Notes in our searchable knowledgebase, http://support.sas.com/kb. You can use the search engine there to find the answers you need.

Below is an example. Note that you should use the OUTDESIGN= and OUTDESIGNONLY options in PROC LOGISTIC since you only want it to create a data set, not try to fit a model, You also need the PARAM=GLM option to use the same dummy coding as PROC GLM.

data test;
do a=1,2; do b=1 to 4; do rep=1,2;
y=rannor(2342); output;
end; end; end;
run;
proc glm data=test;
class a b;
model y=a|b / solution;
output out=outglm p=yhat;
run;
proc print data=outglm;
var a b y yhat;
run;
proc logistic data=test outdesign=od outdesignonly;
class a b / param=glm;
model y=a|b;
run;
proc reg data=od outest=oe;
yhat: model y=a1--a2b4;
run;
proc score data=od score=oe out=preds type=parms;
var a1--a2b4;
run;
proc print data=preds;
var y yhat;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2300 views
  • 0 likes
  • 4 in conversation