DATA Step, Macro, Functions and more

Macro To Create Indicator Variables Fails For Numerical Categorical Variables

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Macro To Create Indicator Variables Fails For Numerical Categorical Variables

I am attempting to modify a macro shared here. I wanted to concatenate a brief string to the front of each indicator variable's header to mark that it's an indicator and what kind of indicator it is. 

 

Below is some test data and the modified macro. It appears to work great on character categorical variables such as "Category1" below, but it doesn't work properly on numerical categorical variables such as "Category2" below (all of the indicators have values "1" for all records). Any tips/ideas are appreciated. 

 

 

* Import example dummy data ;

DATA TestData0;
	LENGTH Category1 $ 9;
	INPUT ID Category1 Category2;
	DATALINES;
1 CategoryA 1
2 CategoryB 1
3 CategoryA 1
4 CategoryA 2
5 CategoryC 3
6 CategoryB 2
;
RUN;

* Write macro to generate indicators from categorical variables ;

%macro cat(indata,outdata,variable,abbr);

*Scrub special characters and ensure the categorical variable entries are properly formatted;

DATA CatTest;
	SET &indata.;

	tempvar = "IND_"||"&abbr."||SUBSTR(compress(&variable.,,'kad'),1,MIN(33-LENGTH("IND_")-%LENGTH("&abbr."),LENGTH(compress(&variable.,,'kad')))) ;

RUN;

* Actually generate the indicators ;

  proc sql noprint;
   select distinct tempvar 
    into :mvals separated by '|'
    from CatTest;
    %let mdim=&sqlobs;
  quit;

  * %PUT &mvals; * Log variable string for debugging ;

  data &outdata.(DROP=tempvar);
    set CatTest;
    %do _i=1 %to &mdim.;
      %let _v = %scan(&mvals., &_i., |);
      if VType(&variable)='C' then do;
         if tempvar = "&_v." then &_v. = 1;
         else &_v = 0;
      end;
      else do;
         if tempvar = &_v. then &_v. = 1;
         else &_v = 0;
            end;
    %end;
  run;

%mend;

* Run macro ;

%CAT(TestData0,TestData1,Category1,CAT1_);
%CAT(TestData1,TestData2,Category2,CAT2_);

Also, as I am still learning SAS, any general tips or suggestions on this macro are certainly welcome. 

 

Thanks! 


Accepted Solutions
Solution
‎08-29-2017 03:42 PM
Super User
Super User
Posts: 8,127

Re: Macro To Create Indicator Variables Fails For Numerical Categorical Variables

Posted in reply to ImSpartacus

You probably do NOT want to be using macro code to do this problem.  But if you did then you need to make sure not to mix your macro code and the SAS code that it is generating.  For starters you should add a datastep to determine if the variable is numeric or character and then you can use that value with macro logic to generate different code for numeric an character input variables.

 

data _null_;
  set &indata ;
  call symputx('vtype',vtype(&variable));
  stop;
run;

So now later you could conditionally generate the code to generate the names for the dummy varaibles based on whether your input variable is character or numeric.

DATA CatTest;
  SET &indata.;
  length tempvar $32 ;
%if (&vtype=C) %then %do;
  tempvar = "IND_&abbr" || compress(&variable,,'kad');
%end;
%else %do;
  tempvar = "IND_&abbr" || compress(put(&variable,best%eval(32-%length(IND_&abbr)).),,'kad');
%end;
run;

 

Note that once you have generated the TEMPVAR you know it is character (since it is the name of the dummy varaible) so your last data step can be much simplier as it no longer depends on the type of the original variable. Also you can take advantage of the fact that SAS evaluates boolean expressions to 0 or 1 to indicate false or true, respectively. 

data &outdata (drop=tempvar);
  set CatTest;
%do _i=1 %to &mdim;
  %let _v = %scan(&mvals,&_i,|);
  &_v = (tempvar="&_v") ;
%end;
run;

 

View solution in original post


All Replies
Super User
Posts: 6,785

Re: Macro To Create Indicator Variables Fails For Numerical Categorical Variables

Posted in reply to ImSpartacus

Regardless of the solution, you can probably get along just fine without these variables.  For example:

 

  • Sort your data set by CATEGORY2, and run a procedure BY CATEGORY2.  Or,
  • Use CATEGORY2 in a CLASS statement (either to process each value separately, or to automatically create and utilize dummy variables such as you are trying to create ... the exact handling depends on the procedure that contains the CLASS statement).

 

Super User
Posts: 23,776

Re: Macro To Create Indicator Variables Fails For Numerical Categorical Variables

Posted in reply to ImSpartacus

This may be helpful. Notice at the end there are links to several other methods. IMO I would recommend one of these approaches. Macro's are great, but save them for when absolutely necessary. A large proportion of problems can be solved in multiple ways.

Solution
‎08-29-2017 03:42 PM
Super User
Super User
Posts: 8,127

Re: Macro To Create Indicator Variables Fails For Numerical Categorical Variables

Posted in reply to ImSpartacus

You probably do NOT want to be using macro code to do this problem.  But if you did then you need to make sure not to mix your macro code and the SAS code that it is generating.  For starters you should add a datastep to determine if the variable is numeric or character and then you can use that value with macro logic to generate different code for numeric an character input variables.

 

data _null_;
  set &indata ;
  call symputx('vtype',vtype(&variable));
  stop;
run;

So now later you could conditionally generate the code to generate the names for the dummy varaibles based on whether your input variable is character or numeric.

DATA CatTest;
  SET &indata.;
  length tempvar $32 ;
%if (&vtype=C) %then %do;
  tempvar = "IND_&abbr" || compress(&variable,,'kad');
%end;
%else %do;
  tempvar = "IND_&abbr" || compress(put(&variable,best%eval(32-%length(IND_&abbr)).),,'kad');
%end;
run;

 

Note that once you have generated the TEMPVAR you know it is character (since it is the name of the dummy varaible) so your last data step can be much simplier as it no longer depends on the type of the original variable. Also you can take advantage of the fact that SAS evaluates boolean expressions to 0 or 1 to indicate false or true, respectively. 

data &outdata (drop=tempvar);
  set CatTest;
%do _i=1 %to &mdim;
  %let _v = %scan(&mvals,&_i,|);
  &_v = (tempvar="&_v") ;
%end;
run;

 

Occasional Contributor
Posts: 9

Re: Macro To Create Indicator Variables Fails For Numerical Categorical Variables

I greatly appreciate your feedback (haven't tested, but it appears that this is probably how I should've structured it from the start).

One quick question: Could you elaborate on the "you need to make sure not to mix your macro code and the SAS code that it is generating" comment?
Super User
Super User
Posts: 8,127

Re: Macro To Create Indicator Variables Fails For Numerical Categorical Variables

[ Edited ]
Posted in reply to ImSpartacus

ImSpartacus wrote:


One quick question: Could you elaborate on the "you need to make sure not to mix your macro code and the SAS code that it is generating" comment?

In this case this issue is that within a single data step a variable cannot both be character and numeric. Based on how you first reference it SAS will pick one when compiling the data step.  So to be able to create a data step that can handle the variable being either character or numeric then you need to use macro logic

%if &vtype=C %then %do;
 < code that treats X as character>
%end;
%else %do;
 < code that treats X as character>
%end;

and not data step logic.

if vtype(x)='C' then do;
 < code that treats X as character >
end;
else do;
 < code that treats X as numeric >
end;
Occasional Contributor
Posts: 9

Re: Macro To Create Indicator Variables Fails For Numerical Categorical Variables

I appreciate the additional clarification.

I was aware that macro logic must be used in a macro and data step logic must be used in a data step, but I didn't know that the two wouldn't play nicely with each other.

Any recommended reading on the topic? I'm sure there are additional nuances that I've overlooked in my limited learning.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 267 views
  • 2 likes
  • 4 in conversation