I have a variable that has some, 1500 character categories, I want to create dummy variables for these categories. Is there any procedure I can use for the creating these variables. Manually it is quite a tiresome task.
%macro cat(indata,outdata, variable);
proc sql noprint;
select distinct &variable.
into :mvals separated by '|'
from &indata.;
%let mdim=&sqlobs;
quit;
data &outdata.;
set &indata.;
%do _i=1 %to &mdim.;
%let _v = %scan(&mvals., &_i., |);
if VType(&variable)='C' then do;
if &variable. = "&_v." then &_v. = 1;
else &_v = 0;
end;
else do;
if &variable. = &_v. then &_v. = 1;
else &_v = 0;
end;
%end;
run;
%mend;
This is the way I have my macro set up:
Run the macro and then just put the name of the input dataset , the name of the output dataset, and the variable which holds the values you are creating the dummy variables for.
%cat(have,want,variable)
Edited at 10:51 PDT. Forgot a ;
You will need to provide some more details. Are you looking to create one level of dummy for each level that appears in the variable? Multiple variables with 0 / 1 coding for some levels? Groups of like values?
You could provide a some examples of what you are doing manually to give us some idea.
There are a couple of solutions here:
Hi,
Try proc glmmod with OUTDESIGN= to create dummy variables.
%macro cat(indata,outdata, variable);
proc sql noprint;
select distinct &variable.
into :mvals separated by '|'
from &indata.;
%let mdim=&sqlobs;
quit;
data &outdata.;
set &indata.;
%do _i=1 %to &mdim.;
%let _v = %scan(&mvals., &_i., |);
if VType(&variable)='C' then do;
if &variable. = "&_v." then &_v. = 1;
else &_v = 0;
end;
else do;
if &variable. = &_v. then &_v. = 1;
else &_v = 0;
end;
%end;
run;
%mend;
This is the way I have my macro set up:
Run the macro and then just put the name of the input dataset , the name of the output dataset, and the variable which holds the values you are creating the dummy variables for.
%cat(have,want,variable)
Edited at 10:51 PDT. Forgot a ;
In the above macro, I need to give variable name manually. But in some scenarios like I have 400 variables and out of those 400 variables, 90 variables are categorical variables. Then it's very difficult to check and picking all those variables.
Is there any code available to solve these kind of issues ?
Thanks in advance.
It is easy for IML. proc iml; use sashelp.class; read all var {sex}; close; vnames=unique(sex); d=design(sex); create want from d[r=sex c=vnames]; append from d[r=sex]; close; quit; proc print;run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.