06-20-2016 11:34 AM
I need to create multiple dummy variables, for age, I need to create 3 dummy varibles, for group, 4, for race, 2, for edu, 3.
So I am thinking to use a loop macro way
%macro glan(demog); %let var1 = group; %let var2 = race; %let var3=edu; data want; set have; array edu(3) edu1-edu3; do i = 1 to 3; edu(i)=(educat=i); end; drop i; run;
I guess I can use nested loop but not sure how to do it. The problem is, each variable has different catogories and need different numbers of dummy variables.
Any advice to make use of the array structure to make our coding more efficient? Thanks.
06-20-2016 12:37 PM
Why would you create dummy variables at all? GLM can handle this for you, with the CLASS statement:
06-20-2016 12:43 PM
Thank you. I want to use those as my confounding factors.
For example, for race, only black is related and will be considered as a confounding.
For group, one group II, will be considered as my confounding.
So my final model will be
Y=aX+ g(2)+ race(b)
So I do not want to add too many that are not related to my model
06-20-2016 12:53 PM
If you really only need a handful of them, just hard-code them:
dummy1 = (group=2);
dummy2 = (race='black');
You don't have to create all possible variables, just those that you will need in your equation.
06-20-2016 12:59 PM - edited 06-20-2016 01:00 PM
Thanks. This is one solution, but not exactly what I want. I need to do multiple analysis, so those dummy variables can be generated for other analyses, not only limited to glm. There are different models. I really want to try the nested loop and macro. Or, if only one, either nested loop, or macro, can work, that will be also great.
06-20-2016 01:07 PM
Here are a few considerations, then.
Do you have any long variable names? It would be convenient to name the dummy variables after the originals, such as race_1, race_2, race_3, etc. But that can fail if the original variable names are too long.
Do the values of any variables contain more than one word?
Could any variables have a missing value? If so, should a dummy be assigned for the missing values?
06-20-2016 01:32 PM
The variable names some times are as long as 7 letters
yes, some values contain more than one word, like "HS diploma"
yes, some contain missing values. Missing values can be ommited, since that is a very small number.
06-20-2016 01:39 PM - edited 06-20-2016 01:41 PM
One last question. Is it reasonable to ask you to call a macro with a separate list of character vs. numeric variables? For example:
%make_dummies (charlist=edu, numlist=age race)
If that is too much work, you would still need to somehow indicate which variables to use for creating dummies, and which to omit.
Basically, a macro will find all possible values for your variables, create the dummy variables, and label them.
And if Tom is suggesting GLMMOD, I would definitely look into that ... very credible source!
06-20-2016 02:14 PM
OK, this will be a bit clumsy.
%macro make_dummies (charlist=, numlist=);
%local i j nextvar;
proc freq data=have noprint;
%do i=1 %to %sysfunc(countw(&charlist &numlist);
%let nextvar = %scan(&charlist &numlist, &i);
tables &nextvar / out=&nextvar (keep=&nextvar);
call execute ('data want; set have;');
%if %length(&charlist) %then %do i=1 %to %sysfunc(countw(&charlist));
%let nextvar = %scan(&charlist, &i);
do until (done&i);
set &nextvar (where=(&nextvar > ' ')) end=done&i;
counter + 1;
call execute(cats("&nextvar._", counter, "= (&nextvar=", &nextvar, ');',
"label &nextvar._", counter, "= '&nextvar';"));
%** Similar loop for numerics;
It's getting complicated enough that I thought I should stop here before writing the loop for numeric variables. It needs to be tested and debugged, and perhaps scrapped if Tom's solution is easier. So it can be tested using character variables only for now.
06-22-2016 09:00 AM
Thank you very much. It is very long and I am lost. Which should be replaced with my own variables? Which should be kept?
06-22-2016 09:07 AM
You would need to put your data into a data set named "have" since that name is hard-coded within the macro. (That part can always be changed later by convertting the data set name into a macro parameter.)
Then call the macro by naming a few character variables:
%make_dummies (charlist=name1 name2 name3)
06-20-2016 02:08 PM - edited 06-20-2016 02:09 PM
%let varlist=age sex ; %let dsin=sashelp.class; %let dsout=want ; data step1 ; _id +1; set &dsin (keep=&varlist); run; proc glmmod noprint data=step1 outparm=step2 outdesign=step3 ; class &varlist; model _id=&varlist /noint; run; data step4 ; set step2; old=cats('COL',_colnum_); new=cats(of effname &varlist); run; proc sql noprint; select catx('=',old,new) into :rename separated by ' ' from step4 ; quit; data &dsout ; set &dsin ; set step3 (drop=_id rename=(&rename)) ; run;