I have a variable that has some, 1500 character categories, I want to create dummy variables for these categories. Is there any procedure I can use for the creating these variables. Manually it is quite a tiresome task.
%macro cat(indata,outdata, variable);
proc sql noprint;
select distinct &variable.
into :mvals separated by '|'
from &indata.;
%let mdim=&sqlobs;
quit;
data &outdata.;
set &indata.;
%do _i=1 %to &mdim.;
%let _v = %scan(&mvals., &_i., |);
if VType(&variable)='C' then do;
if &variable. = "&_v." then &_v. = 1;
else &_v = 0;
end;
else do;
if &variable. = &_v. then &_v. = 1;
else &_v = 0;
end;
%end;
run;
%mend;
This is the way I have my macro set up:
Run the macro and then just put the name of the input dataset , the name of the output dataset, and the variable which holds the values you are creating the dummy variables for.
%cat(have,want,variable)
Edited at 10:51 PDT. Forgot a ;
You will need to provide some more details. Are you looking to create one level of dummy for each level that appears in the variable? Multiple variables with 0 / 1 coding for some levels? Groups of like values?
You could provide a some examples of what you are doing manually to give us some idea.
There are a couple of solutions here:
Hi,
Try proc glmmod with OUTDESIGN= to create dummy variables.
%macro cat(indata,outdata, variable);
proc sql noprint;
select distinct &variable.
into :mvals separated by '|'
from &indata.;
%let mdim=&sqlobs;
quit;
data &outdata.;
set &indata.;
%do _i=1 %to &mdim.;
%let _v = %scan(&mvals., &_i., |);
if VType(&variable)='C' then do;
if &variable. = "&_v." then &_v. = 1;
else &_v = 0;
end;
else do;
if &variable. = &_v. then &_v. = 1;
else &_v = 0;
end;
%end;
run;
%mend;
This is the way I have my macro set up:
Run the macro and then just put the name of the input dataset , the name of the output dataset, and the variable which holds the values you are creating the dummy variables for.
%cat(have,want,variable)
Edited at 10:51 PDT. Forgot a ;
In the above macro, I need to give variable name manually. But in some scenarios like I have 400 variables and out of those 400 variables, 90 variables are categorical variables. Then it's very difficult to check and picking all those variables.
Is there any code available to solve these kind of issues ?
Thanks in advance.
It is easy for IML. proc iml; use sashelp.class; read all var {sex}; close; vnames=unique(sex); d=design(sex); create want from d[r=sex c=vnames]; append from d[r=sex]; close; quit; proc print;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.