DATA Step, Macro, Functions and more

Using a loop to create indicator variables from Categorical string Variable

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

Using a loop to create indicator variables from Categorical string Variable

I need a do loop that will read a categorical variable and then make an indicator variable for each category. Everything I've found on the web relies on the categorical variable being equal to some number which is not the case for me. I do not want to hard code these dummy variables because they will change from one data set to the next and the values will continually be different. Also, the number of categories can be in the 100s. 

 Cat3.JPG

 

Cat4.JPG

 

 

All help would be so greatly appreciated. 

 

The code below is the closest that I have gotten to getting this to work. It just doesn't work because of the numberical problem. 

 

 

 

 

%macro cat(indata, variable);
  proc sql noprint;
    select distinct &variable. into :mvals separated by '|'
    from &indata.;

    %let mdim=&sqlobs;
  quit;

  data &indata.;
    set &indata.;
    %do _i=1 %to &mdim.;
      %let _v = %scan(&mvals., &_i., |);
      if &variable. = &_v. then &variable.&_v. = 1; else &variable.&_v = 0;
    %end;
  run;
%mend;

%cat(people, income);

 


Accepted Solutions
Solution
‎04-08-2016 04:12 PM
Super User
Posts: 10,483

Re: Using a loop to create indicator variables from Categorical string Variable

For this approach to work you need to make sure that the length of text resulting from &variable.&_v.

never exceeds 32 characters. It also may not contain spaces or charcters other than letters, digits and _.

The function NVALID  would allow you validate whether &variable.&_v creates a valid name.

You may also run into problems creating a variable name that already exists in your dataset.

 

You can use the VTYPE function to execute code conditionally. You have an issue with how character and numeric variables would behave.

 

I am very leery of using:

Data &indata;

   Set &indata;

construct in this context as there is a potential for destroying existing data if the &variable.&_v duplicates an existing variable.

 

This code addresses the Character/Numeric issue.

%macro cat(indata, variable);
  proc sql noprint;
    select distinct &variable. into :mvals separated by '|'
    from &indata.;

    %let mdim=&sqlobs;
  quit;

  data &indata.;
    set &indata.;
    %do _i=1 %to &mdim.;
      %let _v = %scan(&mvals., &_i., |);
      if VType(&variable)='C' then do;
         if &variable. = "&_v." then &variable.&_v. = 1;
         else &variable.&_v = 0;
      end;
      Else do;
         if &variable. = &_v. then &variable.&_v. = 1;
         else &variable.&_v = 0;
      end;
    %end;
  run;
%mend;

View solution in original post


All Replies
Solution
‎04-08-2016 04:12 PM
Super User
Posts: 10,483

Re: Using a loop to create indicator variables from Categorical string Variable

For this approach to work you need to make sure that the length of text resulting from &variable.&_v.

never exceeds 32 characters. It also may not contain spaces or charcters other than letters, digits and _.

The function NVALID  would allow you validate whether &variable.&_v creates a valid name.

You may also run into problems creating a variable name that already exists in your dataset.

 

You can use the VTYPE function to execute code conditionally. You have an issue with how character and numeric variables would behave.

 

I am very leery of using:

Data &indata;

   Set &indata;

construct in this context as there is a potential for destroying existing data if the &variable.&_v duplicates an existing variable.

 

This code addresses the Character/Numeric issue.

%macro cat(indata, variable);
  proc sql noprint;
    select distinct &variable. into :mvals separated by '|'
    from &indata.;

    %let mdim=&sqlobs;
  quit;

  data &indata.;
    set &indata.;
    %do _i=1 %to &mdim.;
      %let _v = %scan(&mvals., &_i., |);
      if VType(&variable)='C' then do;
         if &variable. = "&_v." then &variable.&_v. = 1;
         else &variable.&_v = 0;
      end;
      Else do;
         if &variable. = &_v. then &variable.&_v. = 1;
         else &variable.&_v = 0;
      end;
    %end;
  run;
%mend;
Contributor
Posts: 32

Re: Using a loop to create indicator variables from Categorical string Variable

THANK YOU THANK YOU THANK YOU!

Super User
Posts: 5,256

Re: Using a loop to create indicator variables from Categorical string Variable

SInce you didin't posted any executable indata code, I'll just to paper examine your code.

What do you mean by numberical problem?

 

From what I see in the data step, I think you shouldn't use &Variable. in the assignments, just &_v, no?

 

Whenever your are trouble shooting macros, examine the logs after setting symbolgen, mprint ant potentially mlogic.

 

I know this is a foo example to simplify your post, but the data structure in your output makes little sense. I guess that you have a proper id variable somewhere. Either way, why keeping the original category var? I guess is a way to transpose your data for sttaistical/mining like analysis - but then you usually condense no of rows, which you can't if you keep your category var.

Data never sleeps
Respected Advisor
Posts: 3,777

Re: Using a loop to create indicator variables from Categorical string Variable

SAS has tools to make this easy.

 

proc transreg design data=sashelp.class;
   model class(name / zero=none cprefix=0);
   output out=design;
   run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 311 views
  • 3 likes
  • 4 in conversation