BookmarkSubscribeRSS Feed
RobF
Quartz | Level 8

Is it possible to create dummy variables with proc transpose using the prefix option and id statement?

 

Using the prefix=race_ and prefix=hospital_ options and id statement, I've created the proper dummy variable names but I can't seem to assign the proper values to the new variables.

 

I'd like the following dataset:

 

data test;
input membno race $ hospital $10.;
cards;
1 white capital
2 black capital
3 hispanic capital
4 asian capital
5 native stmary
6 white stmary
7 black stmary
;
run;

 

 

To be transformed as follows:

membno race_asian race_black race_hispanic race_native race_white hospital_capital hospital_stmary
1
2
3
4
5
6
7

where each member # is assigned the correct race and hospital with a value of some kind (for example, values of "." and " " in place of 1 and 0 would be fine).

 

 

Thanks

 

 

13 REPLIES 13
Reeza
Super User

In your data step add a variable, either a one or a letter that is constant and use that in the VAR statement in your proc transpose.

 

data have;

set have;

Constant='X';

run;

 

I don't know that you can do both variables in a single proc transpose unfortunately. 

RobF
Quartz | Level 8

Thanks Reeza - like this?

 

data test;
input membno race $ hospital $10.;
constant=1;
cards;
1 white capital
2 black capital
3 hispanic capital
4 asian capital
5 native stmary
6 white stmary
7 black stmary
;
run;

proc sort data=test;
	by race;
run;
proc transpose data=test out=test_trans prefix=race_;
	by race;
	id race;
	var constant;
run;

 

The output dataset still isn't quite right, I'll fiddle around some more.

PGStats
Opal | Level 21

Here is how to do this with proc glmmod :

 


proc glmmod data=test outdesign=dum outparm=parm;
class race hospital;
model membno = race hospital;
run;

data _null_;
if _n_=1 then 
    call execute("data want; set dum; drop col1;");
set parm(firstobs=2) end=done;
call execute(cats("rename col",_colnum_,"=",effname,"_",race,hospital,";"));
if done then call execute("run;");
run;
PG
RobF
Quartz | Level 8

Thanks PGStats, very slick!

PGStats
Opal | Level 21

Added a few bells and whistles...

 

proc glmmod data=test outdesign=want outparm=parm noprint;
class race hospital;
model membno = race hospital / noint;
run;

data _null_;
if _n_=1 then 
    call execute("proc datasets noprint; modify want;");
set parm;
call execute(cats("label col", _colnum_, "='",
    propcase(effname), "=",
    propcase(race), propcase(hospital), "';"));
call execute(cats("rename col", _colnum_, "=", effname, "_",
    race, hospital, ";"));
run;
quit;

Note: the quit; statement at the end terminates the datasets procedure inserted with the execute calls.

PG
Rick_SAS
SAS Super FREQ

Slick, indeed.

 

But be careful to look at your data before applying this technique. In general the levels of the categorical variables might not form valid SAS variable names. For example, if race="African American" or hospital="St. Mary's", then this automated step will fail and you will need to write the RENAME= statements manually.

 

If you do go with PGStats's automatied method, I suggest making the QUIT statement part of the CALL EXECUTE chain, as he originally suggested. I was confused by having the dangling QUIT at the end.

 

data _null_;

...

set parm end=done;

...

if done then call execute("quit;");

run;

RobF
Quartz | Level 8

Agreed, good points.

 

I'm going to hold out on designating a correct answer for a bit longer & see if there's a sneaky way to use proc transpose to create the dummy variables.

sbxkoenk
SAS Super FREQ

Hello,

 

No PROC TRANSPOSE solution, sorry.

But with PROC GLMSELECT (unlike GLMMOD) you get the right (design-) variable names immediatly (no renaming needed)!

 

ods html close;

ods preferences;

ods html;

 

proc glmselect data=sashelp.prdsale namelen=50

               OUTDESIGN(ADDINPUTVARS FULLMODEL)=work.DesignVariables;

class        country region division ProdType Product / param=reference ShowCoding split;

model ACTUAL=country region division ProdType Product / /* STEPS=1 */ STOP=1;

output out=new;

run;

QUIT;

/* end of program */

 

Fit an intercept-only model (you must build a dummy model to get the design variables) for whatever target variable.

 

Cheers,

Koen

RobF
Quartz | Level 8

Thank you Koen -

 

So running the following code:

 

data test;
input membno y race $ hospital $10.;
cards;
1 0 white capital
2 1 black capital
3 1 hispanic capital
4 0 asian capital
5 0 native stmary
6 1 white stmary
7 0 black stmary
;
run;

ods html close;
ods preferences;
ods html;
proc glmselect data=test namelen=50 OUTDESIGN(ADDINPUTVARS FULLMODEL)=work.DesignVariables;
	class race hospital / param=reference ShowCoding split;
	model y = / /* STEPS=1 */ STOP=1;
	output out=new;
run;
QUIT;

 should generate the dummy coded variables in the "new" dataset, is that correct?

 

sbxkoenk
SAS Super FREQ

Hello RobF,

 

You forgot to specify the categorical regressors (class variables) in the right hand side of the model equation.

The dummy variables can be found in the dataset work.DesignVariables!! (I replaced new by _NULL_)

Take care: for p categories you get only p-1 dummies (the alphabetically last class level has no dummy and a 0-value for the p-1 created dummies = reference coding with default reference level). Maybe another parametrization scheme exists that does over-parametrization (p dummies for p categories as PROC GLM does)? You should have a look to the possibilities of the param= option in the CLASS statement of PROC GLMSELECT.

 

Here's the corrected code:

data test;

input membno y race $ hospital $10.;

cards;

1 0 white capital

2 1 black capital

3 1 hispanic capital

4 0 asian capital

5 0 native stmary

6 1 white stmary

7 0 black stmary

;

run;

ods html close;

ods preferences;

ods html;

proc glmselect data=test namelen=50 OUTDESIGN(ADDINPUTVARS FULLMODEL)=work.DesignVariables;

class race hospital / param=reference ShowCoding split;

model y = race hospital / /* STEPS=1 */ STOP=1;

output out=_NULL_;

run;

QUIT;

/* end of program */

 

 

Good luck,

Koen

 

PGStats
Opal | Level 21

Good to know, thanks! Playing with the code, it seems that good results are obtained with:

 


proc glmselect data=test namelen=50 noprint
    OUTDESIGN(FULLMODEL)=want;
class race hospital / param=glm;
model membno = race hospital / include=99 noint;
run;
PG
Rick_SAS
SAS Super FREQ

I think you'll be waiting a long time. PROC TRANSPOSE is not designed to create dummy variables.  It doesn't create new (0/1) data; it rearranges existing data.

RobF
Quartz | Level 8

Thanks for the suggestions, gentlemen - now which response to declare as the official solution?

 

Any one of the solutions would be worth a presentation at the next World SAS Forum if it hasn't already been covered. 🙂

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 6161 views
  • 9 likes
  • 5 in conversation