I am attempting to modify a macro shared here. I wanted to concatenate a brief string to the front of each indicator variable's header to mark that it's an indicator and what kind of indicator it is.
Below is some test data and the modified macro. It appears to work great on character categorical variables such as "Category1" below, but it doesn't work properly on numerical categorical variables such as "Category2" below (all of the indicators have values "1" for all records). Any tips/ideas are appreciated.
* Import example dummy data ;
DATA TestData0;
LENGTH Category1 $ 9;
INPUT ID Category1 Category2;
DATALINES;
1 CategoryA 1
2 CategoryB 1
3 CategoryA 1
4 CategoryA 2
5 CategoryC 3
6 CategoryB 2
;
RUN;
* Write macro to generate indicators from categorical variables ;
%macro cat(indata,outdata,variable,abbr);
*Scrub special characters and ensure the categorical variable entries are properly formatted;
DATA CatTest;
SET &indata.;
tempvar = "IND_"||"&abbr."||SUBSTR(compress(&variable.,,'kad'),1,MIN(33-LENGTH("IND_")-%LENGTH("&abbr."),LENGTH(compress(&variable.,,'kad')))) ;
RUN;
* Actually generate the indicators ;
proc sql noprint;
select distinct tempvar
into :mvals separated by '|'
from CatTest;
%let mdim=&sqlobs;
quit;
* %PUT &mvals; * Log variable string for debugging ;
data &outdata.(DROP=tempvar);
set CatTest;
%do _i=1 %to &mdim.;
%let _v = %scan(&mvals., &_i., |);
if VType(&variable)='C' then do;
if tempvar = "&_v." then &_v. = 1;
else &_v = 0;
end;
else do;
if tempvar = &_v. then &_v. = 1;
else &_v = 0;
end;
%end;
run;
%mend;
* Run macro ;
%CAT(TestData0,TestData1,Category1,CAT1_);
%CAT(TestData1,TestData2,Category2,CAT2_);
Also, as I am still learning SAS, any general tips or suggestions on this macro are certainly welcome.
Thanks!
You probably do NOT want to be using macro code to do this problem. But if you did then you need to make sure not to mix your macro code and the SAS code that it is generating. For starters you should add a datastep to determine if the variable is numeric or character and then you can use that value with macro logic to generate different code for numeric an character input variables.
data _null_;
set &indata ;
call symputx('vtype',vtype(&variable));
stop;
run;
So now later you could conditionally generate the code to generate the names for the dummy varaibles based on whether your input variable is character or numeric.
DATA CatTest;
SET &indata.;
length tempvar $32 ;
%if (&vtype=C) %then %do;
tempvar = "IND_&abbr" || compress(&variable,,'kad');
%end;
%else %do;
tempvar = "IND_&abbr" || compress(put(&variable,best%eval(32-%length(IND_&abbr)).),,'kad');
%end;
run;
Note that once you have generated the TEMPVAR you know it is character (since it is the name of the dummy varaible) so your last data step can be much simplier as it no longer depends on the type of the original variable. Also you can take advantage of the fact that SAS evaluates boolean expressions to 0 or 1 to indicate false or true, respectively.
data &outdata (drop=tempvar);
set CatTest;
%do _i=1 %to &mdim;
%let _v = %scan(&mvals,&_i,|);
&_v = (tempvar="&_v") ;
%end;
run;
Regardless of the solution, you can probably get along just fine without these variables. For example:
This may be helpful. Notice at the end there are links to several other methods. IMO I would recommend one of these approaches. Macro's are great, but save them for when absolutely necessary. A large proportion of problems can be solved in multiple ways.
You probably do NOT want to be using macro code to do this problem. But if you did then you need to make sure not to mix your macro code and the SAS code that it is generating. For starters you should add a datastep to determine if the variable is numeric or character and then you can use that value with macro logic to generate different code for numeric an character input variables.
data _null_;
set &indata ;
call symputx('vtype',vtype(&variable));
stop;
run;
So now later you could conditionally generate the code to generate the names for the dummy varaibles based on whether your input variable is character or numeric.
DATA CatTest;
SET &indata.;
length tempvar $32 ;
%if (&vtype=C) %then %do;
tempvar = "IND_&abbr" || compress(&variable,,'kad');
%end;
%else %do;
tempvar = "IND_&abbr" || compress(put(&variable,best%eval(32-%length(IND_&abbr)).),,'kad');
%end;
run;
Note that once you have generated the TEMPVAR you know it is character (since it is the name of the dummy varaible) so your last data step can be much simplier as it no longer depends on the type of the original variable. Also you can take advantage of the fact that SAS evaluates boolean expressions to 0 or 1 to indicate false or true, respectively.
data &outdata (drop=tempvar);
set CatTest;
%do _i=1 %to &mdim;
%let _v = %scan(&mvals,&_i,|);
&_v = (tempvar="&_v") ;
%end;
run;
@ImSpartacus wrote:
One quick question: Could you elaborate on the "you need to make sure not to mix your macro code and the SAS code that it is generating" comment?
In this case this issue is that within a single data step a variable cannot both be character and numeric. Based on how you first reference it SAS will pick one when compiling the data step. So to be able to create a data step that can handle the variable being either character or numeric then you need to use macro logic
%if &vtype=C %then %do;
< code that treats X as character>
%end;
%else %do;
< code that treats X as character>
%end;
and not data step logic.
if vtype(x)='C' then do;
< code that treats X as character >
end;
else do;
< code that treats X as numeric >
end;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.