SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to creat a set of dummy variables from a rather mixed input?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 75
Accepted Solution

How to creat a set of dummy variables from a rather mixed input?

In my data, there's question like: "What would you often do at weekends?". There are a few available answers and respondents can choose multiple options. But the questionnaire design step wasn't great, so one response looks like this "1. Listen to music; 3. Read novels; 6. Hang-out with friends" while another is like "2. Watch movies; 3. Read novels; 4. Play games".

My question is whether there's a remedy which separates out this mixed answer and creates a set of dummy variables, so that the ouput looks like this:

ID Music Movie Novel Game Work Friend

01    1        0       1        0        0       1

02    0        1       1        1        0       0

Any thoughts are appreciated.


Accepted Solutions
Solution
‎03-06-2015 07:57 PM
Super User
Super User
Posts: 6,501

Re: How to creat a set of dummy variables from a rather mixed input?

Split the values into multiple observations and transpose back .

data response;

   id+1;

   input response $80.;

   cards4;


1. Listen to music; 3. Read novels; 6. Hang-out with friends

2. Watch movies; 3. Read novels; 4. Play games;

;;;;

data vertical ;

  set response ;

  item=response ;

  dummy=1;

  do _n_=1 by 1 until(item=' ');

    item = scan(response,_n_,';');

    if _n_=1 or item ne ' ' then output ;

  end;

run;

proc transpose data=vertical out=want(drop=_name_) ;

  by id ;

  id item ;

  var dummy;

run;

proc print; run;

View solution in original post


All Replies
Super Contributor
Posts: 336

Re: How to creat a set of dummy variables from a rather mixed input?

Assuming that the "1" in "1. Listen to music" sufficiently defines Music=1, etc.:

Data Have;
  Infile Datalines DSD;
  Input Text :$200.;
  Datalines4;
1. Listen to music; 3. Read novels; 6. Hang-out with friends
2. Watch movies; 3. Read novels; 4. Play games
;;;;;

Data Want (Drop=Text i);
  Set Have;
  ID=Put(_N_,Z2.);
  Array Dummy_Vars Music Movie Novel Game Work Friend;
  Do i=1 To Dim(Dummy_Vars);
    Dummy_Vars=IfN(Find(Text,Put(i,1.)),1,0);
  End;
Run;

Super User
Super User
Posts: 7,404

Re: How to creat a set of dummy variables from a rather mixed input?

Hi,

Whilst I agree with user24feb as to the solution to the set clause, I would also suggest the below scenario as it would be quite likely that you would have more categories than given, possibly not even knowing how many?  If so, splitting the results and then transposing may be easier to maintain than editing a list of values in an array.  The code below simply separates each value from the string into a new observation.  You can apply certain names to these, for instance if index(new_str,"music")>0 then "Music"...  and then transpose up.  Or you could just assign them colx to avoid the whole "knowing" what the columns are:

data have;

  str="1. Listen to music; 3. Read novels; 6. Hang-out with friends"; output;

  str="2. Watch movies; 3. Read novels; 4. Play games;"; output;

run;

data inter (drop=i str);

  set have;

  i=1;

  do while (scan(str,i,";") ne "");

    id=put(_n_,z2.);

    new_str=strip(scan(str,i,";"));

    col_no="COL"||strip(put(substr(new_str,1,1),best.));

    output;

    i=i+1;

  end;

run;

From this you can proc transpose by id, using col_no to identify columns.

Respected Advisor
Posts: 3,777

Re: How to creat a set of dummy variables from a rather mixed input?

Perhaps you can just search for the indicator variable names in the text of the response.

data response;
   input response $80.;
  
cards4;
1. Listen to music; 3. Read novels; 6. Hang-out with friends
2. Watch movies; 3. Read novels; 4. Play games;
;;;;
   run;
data indicator;
   set Response;
   array _r
  • Music Movie Novel Game Work Friend;
  •    do i = 1 to dim(_r);
          _r = not not find(response,vname(_r),'IT');
          end;
      
    drop i;
       run;
    proc print;
      
    run;
    3-6-2015 8-17-21 AM.png
    Solution
    ‎03-06-2015 07:57 PM
    Super User
    Super User
    Posts: 6,501

    Re: How to creat a set of dummy variables from a rather mixed input?

    Split the values into multiple observations and transpose back .

    data response;

       id+1;

       input response $80.;

       cards4;


    1. Listen to music; 3. Read novels; 6. Hang-out with friends

    2. Watch movies; 3. Read novels; 4. Play games;

    ;;;;

    data vertical ;

      set response ;

      item=response ;

      dummy=1;

      do _n_=1 by 1 until(item=' ');

        item = scan(response,_n_,';');

        if _n_=1 or item ne ' ' then output ;

      end;

    run;

    proc transpose data=vertical out=want(drop=_name_) ;

      by id ;

      id item ;

      var dummy;

    run;

    proc print; run;

    Frequent Contributor
    Posts: 75

    Re: How to creat a set of dummy variables from a rather mixed input?

    Thanks for all the help. I've not tried every suggested solution, but attempted to figure out one on my own and thought that a simple and potential solution may be:

    (1) Scan the response variable;

    (2) If the response contains "1", then variable "Listen_to_music" returns the value of "1". Otherwise,  "Listen_to_music" is "0"

    (3) Use the similar code at step (2) for other categories.

    Would you think it's an OK approach and what would you suggest for a program to do so?

    Super User
    Super User
    Posts: 6,501

    Re: How to creat a set of dummy variables from a rather mixed input?

    As long as the labels for the responses do not contain code that looks like the numbers it should work.

    data response;

      id+1;

      array items Listen Watch Read Play Number5 Hang_out ;

    input response $80.;

      do _n_=1 to dim(items);

         items(_n_) = 0< indexw(response,cats(_n_,'.'));

      end;

    cards4;

     

    1. Listen to music; 3. Read novels; 6. Hang-out with friends

    2. Watch movies; 3. Read novels; 4. Play games;

    ;;;;

    proc print;

      id id ;

    run;


                H                    

              N a                  

      L       u n          

      i W     m g          

      s a R P b _          

      t t e l e o          

    i e c a a r u          

    d n h d y 5 t  Response

    1 0 0 0 0 0 0

    2 1 0 1 0 0 1 1. Listen to music; 3. Read novels; 6. Hang-out with friends

    3 0 1 1 1 0 0 2. Watch movies; 3. Read novels; 4. Play games;


    New Contributor
    Posts: 3

    Re: How to creat a set of dummy variables from a rather mixed input?

    I have a very similar dataset that I'm trying to adapt this code to and running into issues with the dummy responses not matching up.  I have a file that I import where there is a store name (which I need to keep in the final data) and then a comma separated list of products (lets say there are 5 products for this example).

    Store1, Bread, Eggs, Juice

    Store2, Juice

    Store3, Eggs, Juice

    End result needs to be:

    Store   Bread   Eggs   Juice

    Store1    1           1       1

    Store2    0           0       1

    Store3    0           1       1

    I was able to get the code to run but once I looked at the dummy variables, they were not matching up.

    data response;

      input store :$20. v1 v2 v3 v4 v5;

    run;

    data indicator;

      set response;

      array items

  • $ bread milk juice eggs cereal;
  •   do i = 1 to dim(items);

          items = not not find(response,vname(items),'IT');

       end;

       drop i;

       run;

    proc print;

    run;

    Respected Advisor
    Posts: 3,777

    Re: How to creat a set of dummy variables from a rather mixed input?

    Read the log, look at the data.  Your input is not right.


    data response;
       infile cards truncover;
      
    input store :$20. response $60.;
      
    cards;
    Store1, Bread, Eggs, Juice
    Store2, Juice
    Store3, Eggs, Juice
    ;;;;
       run;
    proc print;
      
    run;
    data indicator;
       set response;
       array items
  • $ bread milk juice eggs cereal;
  •    do i = 1 to dim(items);
          items = not not find(response,vname(items),'IT');
          end;
      
    drop i;
       run;
    proc print;
      
    run;


    3-14-2015 5-47-44 AM.png
    New Contributor
    Posts: 3

    Re: How to creat a set of dummy variables from a rather mixed input?

    That helps.  I'm just learning this stuff so I am trying to learn on the fly.  I'm much closer now as it seems to be picking up the first product consistently.  My data actually is in the format of:

    Store 1,bread,eggs,juice

    Store 2,juice

    So I just need to figure out the formatting since it doesn't like the spaces and commas.  Thanks for the the help.

    Super User
    Super User
    Posts: 6,501

    Re: How to creat a set of dummy variables from a rather mixed input?

    You probably are having trouble with the step that is reading the text into a SAS dataset.

    To generate the two variable structure used in the solution you can mix list and formatted input modes.

    Read the lines using comma as a delimiter, but read the RESPONSE variable using a INFORMAT so that it ignores the commas and just reads the values.

    data response;
       infile CARDS dsd dlm=',' truncover;

       length store $20 response $60 ;

      input store response $60.;
    cards;
    Store 1,Bread, Eggs, Juice
    Store 2,Juice
    Store 3,Eggs, Juice
    ;;;;


    732  data _null_;

    733    set response ;

    734    file log dsd ;

    735    put store response ;

    736  run;

    Store 1,"Bread, Eggs, Juice"

    Store 2,Juice

    Store 3,"Eggs, Juice"

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 10 replies
    • 699 views
    • 6 likes
    • 6 in conversation