Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Something similar to Proc Score for Categorical variables

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-01-2009 11:52 AM
(2186 views)

Hi,

I'm trying to do an model with categorical variables. I have 4 categorical variables in my estimation dataset, run PROC GLM and get the model.

Now I want to apply that model into a much bigger dataset. I couldn't do it with proc score because the veriables are categorical. And those variables have about 40 discrete values in each, so make dummy variables may be painful. Any ideas on how should I do it?

My code:

proc GLM data=DataIn outstat=RegOut;

class A B C;

model ModelOut = A B C B*C/ solution;

output out=out p=yhat;

run;

quit;

What I wanna do is similar to this one (I couldn't do it because the variables are categorical)

proc

score data=DataTest score=RegOut out=DataOut;

var

A B C B*C;

run

;

Any suggestions are highly appreciated.

I'm trying to do an model with categorical variables. I have 4 categorical variables in my estimation dataset, run PROC GLM and get the model.

Now I want to apply that model into a much bigger dataset. I couldn't do it with proc score because the veriables are categorical. And those variables have about 40 discrete values in each, so make dummy variables may be painful. Any ideas on how should I do it?

My code:

proc GLM data=DataIn outstat=RegOut;

class A B C;

model ModelOut = A B C B*C/ solution;

output out=out p=yhat;

run;

quit;

What I wanna do is similar to this one (I couldn't do it because the variables are categorical)

proc

score data=DataTest score=RegOut out=DataOut;

var

A B C B*C;

run

;

Any suggestions are highly appreciated.

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Darren,

Although you could probably do this with TRANSPOSE and some matrix multiplication, I wonder if you need to re-think your question. The model you described, with just 3 predictor variables and one interaction, requires about 1700 degrees of freedom. Unless your DataIn dataset is very rich and has several hundred thousand observations, you will not be able to get a model that validates internally, let alone provides reasonable scoring.

Doc Muhlbaier

Duke

Although you could probably do this with TRANSPOSE and some matrix multiplication, I wonder if you need to re-think your question. The model you described, with just 3 predictor variables and one interaction, requires about 1700 degrees of freedom. Unless your DataIn dataset is very rich and has several hundred thousand observations, you will not be able to get a model that validates internally, let alone provides reasonable scoring.

Doc Muhlbaier

Duke

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In addition to Doc's excellent point .... even if your sample is huge (say 5 million subjects) I'd wonder how you will interpret the results of an interaction of two categorical variables with 40 levels each. And I'd worry about how to tell which are real indicators of a population difference and which are random chance, unless you have strong a priori hypotheses.

Could you tell us a bit about what you are studying?

Could you tell us a bit about what you are studying?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can use PROC LOGISTIC (or PROC GLMMOD) to create the dummy variables for you as discussed in this usage note: http://support.sas.com/kb/23217 . The answers to many questions can be found in the Samples and SAS Notes in our searchable knowledgebase, http://support.sas.com/kb. You can use the search engine there to find the answers you need.

Below is an example. Note that you should use the OUTDESIGN= and OUTDESIGNONLY options in PROC LOGISTIC since you only want it to create a data set, not try to fit a model, You also need the PARAM=GLM option to use the same dummy coding as PROC GLM.

data test;

do a=1,2; do b=1 to 4; do rep=1,2;

y=rannor(2342); output;

end; end; end;

run;

proc glm data=test;

class a b;

model y=a|b / solution;

output out=outglm p=yhat;

run;

proc print data=outglm;

var a b y yhat;

run;

proc logistic data=test outdesign=od outdesignonly;

class a b / param=glm;

model y=a|b;

run;

proc reg data=od outest=oe;

yhat: model y=a1--a2b4;

run;

proc score data=od score=oe out=preds type=parms;

var a1--a2b4;

run;

proc print data=preds;

var y yhat;

run;

Below is an example. Note that you should use the OUTDESIGN= and OUTDESIGNONLY options in PROC LOGISTIC since you only want it to create a data set, not try to fit a model, You also need the PARAM=GLM option to use the same dummy coding as PROC GLM.

data test;

do a=1,2; do b=1 to 4; do rep=1,2;

y=rannor(2342); output;

end; end; end;

run;

proc glm data=test;

class a b;

model y=a|b / solution;

output out=outglm p=yhat;

run;

proc print data=outglm;

var a b y yhat;

run;

proc logistic data=test outdesign=od outdesignonly;

class a b / param=glm;

model y=a|b;

run;

proc reg data=od outest=oe;

yhat: model y=a1--a2b4;

run;

proc score data=od score=oe out=preds type=parms;

var a1--a2b4;

run;

proc print data=preds;

var y yhat;

run;

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.