I need to create multiple dummy variables, for age, I need to create 3 dummy varibles, for group, 4, for race, 2, for edu, 3.
So I am thinking to use a loop macro way
%macro glan(demog);
%let var1 = group;
%let var2 = race;
%let var3=edu;
data want;
set have;
array edu(3) edu1-edu3;
do i = 1 to 3;
edu(i)=(educat=i);
end;
drop i;
run;
I guess I can use nested loop but not sure how to do it. The problem is, each variable has different catogories and need different numbers of dummy variables.
Any advice to make use of the array structure to make our coding more efficient? Thanks.
Why would you create dummy variables at all? GLM can handle this for you, with the CLASS statement:
Thank you. I want to use those as my confounding factors.
For example, for race, only black is related and will be considered as a confounding.
For group, one group II, will be considered as my confounding.
So my final model will be
Y=aX+ g(2)+ race(b)
So I do not want to add too many that are not related to my model
any advice?
If you really only need a handful of them, just hard-code them:
data want;
set have;
dummy1 = (group=2);
dummy2 = (race='black');
run;
You don't have to create all possible variables, just those that you will need in your equation.
Thanks. This is one solution, but not exactly what I want. I need to do multiple analysis, so those dummy variables can be generated for other analyses, not only limited to glm. There are different models. I really want to try the nested loop and macro. Or, if only one, either nested loop, or macro, can work, that will be also great.
Here are a few considerations, then.
Do you have any long variable names? It would be convenient to name the dummy variables after the originals, such as race_1, race_2, race_3, etc. But that can fail if the original variable names are too long.
Do the values of any variables contain more than one word?
Could any variables have a missing value? If so, should a dummy be assigned for the missing values?
The variable names some times are as long as 7 letters
yes, some values contain more than one word, like "HS diploma"
yes, some contain missing values. Missing values can be ommited, since that is a very small number.
thanks
One last question. Is it reasonable to ask you to call a macro with a separate list of character vs. numeric variables? For example:
%make_dummies (charlist=edu, numlist=age race)
If that is too much work, you would still need to somehow indicate which variables to use for creating dummies, and which to omit.
Basically, a macro will find all possible values for your variables, create the dummy variables, and label them.
And if Tom is suggesting GLMMOD, I would definitely look into that ... very credible source!
yes
%make_dummies (charlist=edu group race, numlist=age )
OK, this will be a bit clumsy.
%macro make_dummies (charlist=, numlist=);
%local i j nextvar;
proc freq data=have noprint;
%do i=1 %to %sysfunc(countw(&charlist &numlist);
%let nextvar = %scan(&charlist &numlist, &i);
tables &nextvar / out=&nextvar (keep=&nextvar);
%end;
run;
data _null_;
call execute ('data want; set have;');
%if %length(&charlist) %then %do i=1 %to %sysfunc(countw(&charlist));
%let nextvar = %scan(&charlist, &i);
counter=0;
do until (done&i);
set &nextvar (where=(&nextvar > ' ')) end=done&i;
counter + 1;
call execute(cats("&nextvar._", counter, "= (&nextvar=", &nextvar, ');',
"label &nextvar._", counter, "= '&nextvar';"));
end;
%end;
%** Similar loop for numerics;
call execute('run;');
stop;
run;
%mend make_dummies;
It's getting complicated enough that I thought I should stop here before writing the loop for numeric variables. It needs to be tested and debugged, and perhaps scrapped if Tom's solution is easier. So it can be tested using character variables only for now.
Thank you very much. It is very long and I am lost. Which should be replaced with my own variables? Which should be kept?
You would need to put your data into a data set named "have" since that name is hard-coded within the macro. (That part can always be changed later by convertting the data set name into a macro parameter.)
Then call the macro by naming a few character variables:
%make_dummies (charlist=name1 name2 name3)
Why don't you just use PROC GLMMOD to do it for you?
%let varlist=age sex ;
%let dsin=sashelp.class;
%let dsout=want ;
data step1 ;
_id +1;
set &dsin (keep=&varlist);
run;
proc glmmod noprint data=step1 outparm=step2 outdesign=step3 ;
class &varlist;
model _id=&varlist /noint;
run;
data step4 ;
set step2;
old=cats('COL',_colnum_);
new=cats(of effname &varlist);
run;
proc sql noprint;
select catx('=',old,new)
into :rename separated by ' '
from step4
;
quit;
data &dsout ;
set &dsin ;
set step3 (drop=_id rename=(&rename)) ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.