BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NonSleeper
Quartz | Level 8

In my data, there's question like: "What would you often do at weekends?". There are a few available answers and respondents can choose multiple options. But the questionnaire design step wasn't great, so one response looks like this "1. Listen to music; 3. Read novels; 6. Hang-out with friends" while another is like "2. Watch movies; 3. Read novels; 4. Play games".

My question is whether there's a remedy which separates out this mixed answer and creates a set of dummy variables, so that the ouput looks like this:

ID Music Movie Novel Game Work Friend

01    1        0       1        0        0       1

02    0        1       1        1        0       0

Any thoughts are appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Split the values into multiple observations and transpose back .

data response;

   id+1;

   input response $80.;

   cards4;


1. Listen to music; 3. Read novels; 6. Hang-out with friends

2. Watch movies; 3. Read novels; 4. Play games;

;;;;

data vertical ;

  set response ;

  item=response ;

  dummy=1;

  do _n_=1 by 1 until(item=' ');

    item = scan(response,_n_,';');

    if _n_=1 or item ne ' ' then output ;

  end;

run;

proc transpose data=vertical out=want(drop=_name_) ;

  by id ;

  id item ;

  var dummy;

run;

proc print; run;

View solution in original post

10 REPLIES 10
user24feb
Barite | Level 11

Assuming that the "1" in "1. Listen to music" sufficiently defines Music=1, etc.:

Data Have;
  Infile Datalines DSD;
  Input Text :$200.;
  Datalines4;
1. Listen to music; 3. Read novels; 6. Hang-out with friends
2. Watch movies; 3. Read novels; 4. Play games
;;;;;

Data Want (Drop=Text i);
  Set Have;
  ID=Put(_N_,Z2.);
  Array Dummy_Vars Music Movie Novel Game Work Friend;
  Do i=1 To Dim(Dummy_Vars);
    Dummy_Vars=IfN(Find(Text,Put(i,1.)),1,0);
  End;
Run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Whilst I agree with user24feb as to the solution to the set clause, I would also suggest the below scenario as it would be quite likely that you would have more categories than given, possibly not even knowing how many?  If so, splitting the results and then transposing may be easier to maintain than editing a list of values in an array.  The code below simply separates each value from the string into a new observation.  You can apply certain names to these, for instance if index(new_str,"music")>0 then "Music"...  and then transpose up.  Or you could just assign them colx to avoid the whole "knowing" what the columns are:

data have;

  str="1. Listen to music; 3. Read novels; 6. Hang-out with friends"; output;

  str="2. Watch movies; 3. Read novels; 4. Play games;"; output;

run;

data inter (drop=i str);

  set have;

  i=1;

  do while (scan(str,i,";") ne "");

    id=put(_n_,z2.);

    new_str=strip(scan(str,i,";"));

    col_no="COL"||strip(put(substr(new_str,1,1),best.));

    output;

    i=i+1;

  end;

run;

From this you can proc transpose by id, using col_no to identify columns.

data_null__
Jade | Level 19

Perhaps you can just search for the indicator variable names in the text of the response.

data response;
   input response $80.;
  
cards4;
1. Listen to music; 3. Read novels; 6. Hang-out with friends
2. Watch movies; 3. Read novels; 4. Play games;
;;;;
   run;
data indicator;
   set Response;
   array _r
  • Music Movie Novel Game Work Friend;
  •    do i = 1 to dim(_r);
          _r = not not find(response,vname(_r),'IT');
          end;
      
    drop i;
       run;
    proc print;
      
    run;
    3-6-2015 8-17-21 AM.png
    Tom
    Super User Tom
    Super User

    Split the values into multiple observations and transpose back .

    data response;

       id+1;

       input response $80.;

       cards4;


    1. Listen to music; 3. Read novels; 6. Hang-out with friends

    2. Watch movies; 3. Read novels; 4. Play games;

    ;;;;

    data vertical ;

      set response ;

      item=response ;

      dummy=1;

      do _n_=1 by 1 until(item=' ');

        item = scan(response,_n_,';');

        if _n_=1 or item ne ' ' then output ;

      end;

    run;

    proc transpose data=vertical out=want(drop=_name_) ;

      by id ;

      id item ;

      var dummy;

    run;

    proc print; run;

    NonSleeper
    Quartz | Level 8

    Thanks for all the help. I've not tried every suggested solution, but attempted to figure out one on my own and thought that a simple and potential solution may be:

    (1) Scan the response variable;

    (2) If the response contains "1", then variable "Listen_to_music" returns the value of "1". Otherwise,  "Listen_to_music" is "0"

    (3) Use the similar code at step (2) for other categories.

    Would you think it's an OK approach and what would you suggest for a program to do so?

    Tom
    Super User Tom
    Super User

    As long as the labels for the responses do not contain code that looks like the numbers it should work.

    data response;

      id+1;

      array items Listen Watch Read Play Number5 Hang_out ;

    input response $80.;

      do _n_=1 to dim(items);

         items(_n_) = 0< indexw(response,cats(_n_,'.'));

      end;

    cards4;

     

    1. Listen to music; 3. Read novels; 6. Hang-out with friends

    2. Watch movies; 3. Read novels; 4. Play games;

    ;;;;

    proc print;

      id id ;

    run;


                H                    

              N a                  

      L       u n          

      i W     m g          

      s a R P b _          

      t t e l e o          

    i e c a a r u          

    d n h d y 5 t  Response

    1 0 0 0 0 0 0

    2 1 0 1 0 0 1 1. Listen to music; 3. Read novels; 6. Hang-out with friends

    3 0 1 1 1 0 0 2. Watch movies; 3. Read novels; 4. Play games;


    Yankee1423
    Calcite | Level 5

    I have a very similar dataset that I'm trying to adapt this code to and running into issues with the dummy responses not matching up.  I have a file that I import where there is a store name (which I need to keep in the final data) and then a comma separated list of products (lets say there are 5 products for this example).

    Store1, Bread, Eggs, Juice

    Store2, Juice

    Store3, Eggs, Juice

    End result needs to be:

    Store   Bread   Eggs   Juice

    Store1    1           1       1

    Store2    0           0       1

    Store3    0           1       1

    I was able to get the code to run but once I looked at the dummy variables, they were not matching up.

    data response;

      input store :$20. v1 v2 v3 v4 v5;

    run;

    data indicator;

      set response;

      array items

  • $ bread milk juice eggs cereal;
  •   do i = 1 to dim(items);

          items = not not find(response,vname(items),'IT');

       end;

       drop i;

       run;

    proc print;

    run;

    data_null__
    Jade | Level 19

    Read the log, look at the data.  Your input is not right.


    data response;
       infile cards truncover;
      
    input store :$20. response $60.;
      
    cards;
    Store1, Bread, Eggs, Juice
    Store2, Juice
    Store3, Eggs, Juice
    ;;;;
       run;
    proc print;
      
    run;
    data indicator;
       set response;
       array items
  • $ bread milk juice eggs cereal;
  •    do i = 1 to dim(items);
          items = not not find(response,vname(items),'IT');
          end;
      
    drop i;
       run;
    proc print;
      
    run;


    3-14-2015 5-47-44 AM.png
    Yankee1423
    Calcite | Level 5

    That helps.  I'm just learning this stuff so I am trying to learn on the fly.  I'm much closer now as it seems to be picking up the first product consistently.  My data actually is in the format of:

    Store 1,bread,eggs,juice

    Store 2,juice

    So I just need to figure out the formatting since it doesn't like the spaces and commas.  Thanks for the the help.

    Tom
    Super User Tom
    Super User

    You probably are having trouble with the step that is reading the text into a SAS dataset.

    To generate the two variable structure used in the solution you can mix list and formatted input modes.

    Read the lines using comma as a delimiter, but read the RESPONSE variable using a INFORMAT so that it ignores the commas and just reads the values.

    data response;
       infile CARDS dsd dlm=',' truncover;

       length store $20 response $60 ;

      input store response $60.;
    cards;
    Store 1,Bread, Eggs, Juice
    Store 2,Juice
    Store 3,Eggs, Juice
    ;;;;


    732  data _null_;

    733    set response ;

    734    file log dsd ;

    735    put store response ;

    736  run;

    Store 1,"Bread, Eggs, Juice"

    Store 2,Juice

    Store 3,"Eggs, Juice"

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to connect to databases in SAS Viya

    Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

    Find more tutorials on the SAS Users YouTube channel.

    Discussion stats
    • 10 replies
    • 1558 views
    • 6 likes
    • 6 in conversation