I'm trying to do an model with categorical variables. I have 4 categorical variables in my estimation dataset, run PROC GLM and get the model.
Now I want to apply that model into a much bigger dataset. I couldn't do it with proc score because the veriables are categorical. And those variables have about 40 discrete values in each, so make dummy variables may be painful. Any ideas on how should I do it?
proc GLM data=DataIn outstat=RegOut;
class A B C;
model ModelOut = A B C B*C/ solution;
output out=out p=yhat;
What I wanna do is similar to this one (I couldn't do it because the variables are categorical)
score data=DataTest score=RegOut out=DataOut;
A B C B*C;
Any suggestions are highly appreciated.
Although you could probably do this with TRANSPOSE and some matrix multiplication, I wonder if you need to re-think your question. The model you described, with just 3 predictor variables and one interaction, requires about 1700 degrees of freedom. Unless your DataIn dataset is very rich and has several hundred thousand observations, you will not be able to get a model that validates internally, let alone provides reasonable scoring.
In addition to Doc's excellent point .... even if your sample is huge (say 5 million subjects) I'd wonder how you will interpret the results of an interaction of two categorical variables with 40 levels each. And I'd worry about how to tell which are real indicators of a population difference and which are random chance, unless you have strong a priori hypotheses.
Could you tell us a bit about what you are studying?
You can use PROC LOGISTIC (or PROC GLMMOD) to create the dummy variables for you as discussed in this usage note: http://support.sas.com/kb/23217 . The answers to many questions can be found in the Samples and SAS Notes in our searchable knowledgebase, http://support.sas.com/kb. You can use the search engine there to find the answers you need.
Below is an example. Note that you should use the OUTDESIGN= and OUTDESIGNONLY options in PROC LOGISTIC since you only want it to create a data set, not try to fit a model, You also need the PARAM=GLM option to use the same dummy coding as PROC GLM.
do a=1,2; do b=1 to 4; do rep=1,2;
end; end; end;
proc glm data=test;
class a b;
model y=a|b / solution;
output out=outglm p=yhat;
proc print data=outglm;
var a b y yhat;
proc logistic data=test outdesign=od outdesignonly;
class a b / param=glm;
proc reg data=od outest=oe;
yhat: model y=a1--a2b4;
proc score data=od score=oe out=preds type=parms;
proc print data=preds;
var y yhat;