turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Something similar to Proc Score for Categorical va...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-01-2009 11:52 AM

Hi,

I'm trying to do an model with categorical variables. I have 4 categorical variables in my estimation dataset, run PROC GLM and get the model.

Now I want to apply that model into a much bigger dataset. I couldn't do it with proc score because the veriables are categorical. And those variables have about 40 discrete values in each, so make dummy variables may be painful. Any ideas on how should I do it?

My code:

proc GLM data=DataIn outstat=RegOut;

class A B C;

model ModelOut = A B C B*C/ solution;

output out=out p=yhat;

run;

quit;

What I wanna do is similar to this one (I couldn't do it because the variables are categorical)

proc

score data=DataTest score=RegOut out=DataOut;

var

A B C B*C;

run

;

Any suggestions are highly appreciated.

I'm trying to do an model with categorical variables. I have 4 categorical variables in my estimation dataset, run PROC GLM and get the model.

Now I want to apply that model into a much bigger dataset. I couldn't do it with proc score because the veriables are categorical. And those variables have about 40 discrete values in each, so make dummy variables may be painful. Any ideas on how should I do it?

My code:

proc GLM data=DataIn outstat=RegOut;

class A B C;

model ModelOut = A B C B*C/ solution;

output out=out p=yhat;

run;

quit;

What I wanna do is similar to this one (I couldn't do it because the variables are categorical)

proc

score data=DataTest score=RegOut out=DataOut;

var

A B C B*C;

run

;

Any suggestions are highly appreciated.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Darren_Pham

06-01-2009 12:29 PM

Darren,

Although you could probably do this with TRANSPOSE and some matrix multiplication, I wonder if you need to re-think your question. The model you described, with just 3 predictor variables and one interaction, requires about 1700 degrees of freedom. Unless your DataIn dataset is very rich and has several hundred thousand observations, you will not be able to get a model that validates internally, let alone provides reasonable scoring.

Doc Muhlbaier

Duke

Although you could probably do this with TRANSPOSE and some matrix multiplication, I wonder if you need to re-think your question. The model you described, with just 3 predictor variables and one interaction, requires about 1700 degrees of freedom. Unless your DataIn dataset is very rich and has several hundred thousand observations, you will not be able to get a model that validates internally, let alone provides reasonable scoring.

Doc Muhlbaier

Duke

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Darren_Pham

06-02-2009 07:47 AM

In addition to Doc's excellent point .... even if your sample is huge (say 5 million subjects) I'd wonder how you will interpret the results of an interaction of two categorical variables with 40 levels each. And I'd worry about how to tell which are real indicators of a population difference and which are random chance, unless you have strong a priori hypotheses.

Could you tell us a bit about what you are studying?

Could you tell us a bit about what you are studying?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Darren_Pham

07-07-2009 02:13 PM

You can use PROC LOGISTIC (or PROC GLMMOD) to create the dummy variables for you as discussed in this usage note: http://support.sas.com/kb/23217 . The answers to many questions can be found in the Samples and SAS Notes in our searchable knowledgebase, http://support.sas.com/kb. You can use the search engine there to find the answers you need.

Below is an example. Note that you should use the OUTDESIGN= and OUTDESIGNONLY options in PROC LOGISTIC since you only want it to create a data set, not try to fit a model, You also need the PARAM=GLM option to use the same dummy coding as PROC GLM.

data test;

do a=1,2; do b=1 to 4; do rep=1,2;

y=rannor(2342); output;

end; end; end;

run;

proc glm data=test;

class a b;

model y=a|b / solution;

output out=outglm p=yhat;

run;

proc print data=outglm;

var a b y yhat;

run;

proc logistic data=test outdesign=od outdesignonly;

class a b / param=glm;

model y=a|b;

run;

proc reg data=od outest=oe;

yhat: model y=a1--a2b4;

run;

proc score data=od score=oe out=preds type=parms;

var a1--a2b4;

run;

proc print data=preds;

var y yhat;

run;

Below is an example. Note that you should use the OUTDESIGN= and OUTDESIGNONLY options in PROC LOGISTIC since you only want it to create a data set, not try to fit a model, You also need the PARAM=GLM option to use the same dummy coding as PROC GLM.

data test;

do a=1,2; do b=1 to 4; do rep=1,2;

y=rannor(2342); output;

end; end; end;

run;

proc glm data=test;

class a b;

model y=a|b / solution;

output out=outglm p=yhat;

run;

proc print data=outglm;

var a b y yhat;

run;

proc logistic data=test outdesign=od outdesignonly;

class a b / param=glm;

model y=a|b;

run;

proc reg data=od outest=oe;

yhat: model y=a1--a2b4;

run;

proc score data=od score=oe out=preds type=parms;

var a1--a2b4;

run;

proc print data=preds;

var y yhat;

run;