BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
685932
Calcite | Level 5

Hello,

 

I'm in the process of cleaning a dataset (from a survey) using SAS 9.4 and I'm trying to figure out a more streamlined process for cleaning one of the variables (called Q12). For this question, people were allowed to select more than one response, and all of their choices were combined into one variable (Q12), separated by commas. There were 14 choices that they could select. An example of what the data looks like is below: 

 

ID       Q12 

1        1,2,3,4,5

2        4,11,13

3        6,7,8,9,10,11   

 

I am trying to re-code Q12 into 14 variables corresponding to each of the question choices. For example, I would create a new variable called Q12_1 for all of the people who selected choice 1 and it would be coded 0 or 1 (0= did not select, 1 = did select), and so on and so forth. I was going to use the INDEX function to pull the choice # (1-14) in Q12, and I could just do this 14 times, but I wanted a more efficient way of doing it. Here's the code I could use:

 

DATA TEST;
SET CLEAN_DATASET;

 

 

if index(Q12, "1") then Q12_1 = 1; 
else if Q12 = ' ' then Q12_1 = .;
else Q12_1 = 0;

 

RUN;

 

I could repeat this IF statement 14 times, swapping out the "1" for every choice #, but I want to learn how to do this using some kind of loop like an array to save myself time. However, I've been running into trouble because I can't figure out how to create a loop of the numbers 1-14 in the code below:        

 

 

DATA TEST;
SET CLEAN_DATASET;

 

array X [14] Q12_1 Q12_2 Q12_3 Q12_4 Q12_5 Q12_6 Q12_7 Q12_8 Q12_9 Q12_10 Q12_11 Q12_12 Q12_13 Q12_14;

 

do i = 1 to 14;

if index(Q12, "i") then X[i] = 1;
else if Q12 = ' ' then X[i] = .;
else X[i] = 0;

end;
Drop i;

 

RUN;

 

I seem to be running into trouble with the INDEX function. When I ran this, my newly created variables only contained 0's and blanks, instead of 1's as well. I also tried adding another DO loop and a temporary array but that didn't work either. Any suggestions??

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Ok. This gives you what you want 🙂

 

data have;
input ID Q12 $20.;
datalines;
1 1,2,3,4,5
2 4,11,13
3 6,7,8,9,10,11
4 .
5 1,9
6 .
;

data want(drop=i j idx);
   set have(rename=(Q12=_Q12));
   array Q{14} Q1-Q14 (14*0);

   put _Q12;

   do i=1 to countw(_Q12, ',');
      idx=input(scan(_Q12, i, ','), 8.);
      if idx in (1:14) then Q[idx]=1;
   end;

   if sum(of Q[*]) eq 0 then call missing(of Q[*]);

   output;

   do j=1 to dim(Q);
      Q[j]=0;
   end;
run;

View solution in original post

15 REPLIES 15
PeterClemmensen
Tourmaline | Level 20

Welcome to the SAS communities 🙂

 

There is probably a simpler way, but here is my approach

 

data have;
input ID Q12 $20.;
datalines;
1 1,2,3,4,5
2 4,11,13
3 6,7,8,9,10,11
;

data want(drop=i j idx);
   set have(rename=(Q12=_Q12));
   array Q{14} Q1-Q14 (14*0);

   do i=1 to countw(_Q12, ',');
      idx=input(scan(_Q12, i, ','), 8.);
      Q[idx]=1;
   end;

   output;

   do j=1 to dim(Q);
      Q[j]=0;
   end;
run;
685932
Calcite | Level 5

Thank you very much for your help!

 

I tried your code, but I got this error message:

 

25 GOPTIONS ACCESSIBLE;
26 Data TEST(drop = i j idx);
27 Set CLEAN_DATASET(rename = (Q12 = _Q12));
28 
29 Array Q[14] Q12_1-Q12_14 (14*0);
30 
31 do i = 1 to countw(_Q12, ',');
32 idx = input(scan(_Q12, i, ','), 8.);
33 Q[idx] = 1;
34 end;
35 
36 output;
37 
38 do j = 1 to dim(Q);
39 Q[j] = 0;
40 
41 end;
42 
43 run;

ERROR: Array subscript out of range at line 33 column 1.

PeterClemmensen
Tourmaline | Level 20

This would happen if your test answers contain values other than 1,2,3,4,5,6,7,8,9,10,11,12,13 and 14 ..

 

Which is not the case in your posted example data. But may be the case in your actual data. Am I correct?

PeterClemmensen
Tourmaline | Level 20

A quick fix that takes care of invalid data entries could be like below.

 

Notice how I purposely added an invalid response number of 15 in the third observation, which is now handled. However, I would probably clean the data properly at an earlier state.

 

data have;
input ID Q12 $20.;
datalines;
1 1,2,3,4,5
2 4,11,13
3 6,7,8,9,10,11,15
;

data want(drop=i j idx);
   set have(rename=(Q12=_Q12));
   array Q{14} Q1-Q14 (14*0);

   do i=1 to countw(_Q12, ',');
      idx=input(scan(_Q12, i, ','), 8.);
      if idx in (1:14) then Q[idx]=1;
   end;

   output;

   do j=1 to dim(Q);
      Q[j]=0;
   end;
run;

 

685932
Calcite | Level 5

Thank you, I will try this. The actual data contains missing values, which I am assuming is why there was an error? I want to keep the missing values as missing, and not code them as 0. 

PeterClemmensen
Tourmaline | Level 20

What does a missing value look like in this context? Two commas in a row? Please edit the data step that I created of your posted example data to shows us.

 

Also, how do you want to 'keep' the missing values here?

 

 

data have;
input ID Q12 $20.;
datalines;
1 1,2,3,4,5
2 4,11,13
3 6,7,8,9,10,11
;

685932
Calcite | Level 5

Sorry, I should have been more clear. A missing value in my dataset is someone who didn't answer the question at all. I want to be able to distinguish between those who didn't answer the question, and people who did answer the question but may not have selected a certain choice (I would code the choices they didn't select as 0). 

 

Here's an example:

data have;
input ID Q12 $20.;
datalines;1 1,2,3,4,5
2 4,11,13
3 6,7,8,9,10,11
4 .
5 1,9
6 .
;

 ID 4 and 6 is what a missing value would look like. If a person answered the question but didn't select a certain choice, the missing choices aren't listed (see ID 1,2,3, and 5). 

 

I want to code things like this:

 

For ID 5, for example, I would code Q12_1 and Q12_9 =1 and all the other variables would be coded as 0 (since this person answered the question). But for ID 6, I would code all the variables = missing. 

 

 

685932
Calcite | Level 5

Well that got messed up...sorry let me try that again

 

 

data have;
input ID Q12 $20.;
datalines;
1 1,2,3,4,5
2 4,11,13
3 6,7,8,9,10,11

4 .

5 1,9

6 .
;

PeterClemmensen
Tourmaline | Level 20

Ok. This gives you what you want 🙂

 

data have;
input ID Q12 $20.;
datalines;
1 1,2,3,4,5
2 4,11,13
3 6,7,8,9,10,11
4 .
5 1,9
6 .
;

data want(drop=i j idx);
   set have(rename=(Q12=_Q12));
   array Q{14} Q1-Q14 (14*0);

   put _Q12;

   do i=1 to countw(_Q12, ',');
      idx=input(scan(_Q12, i, ','), 8.);
      if idx in (1:14) then Q[idx]=1;
   end;

   if sum(of Q[*]) eq 0 then call missing(of Q[*]);

   output;

   do j=1 to dim(Q);
      Q[j]=0;
   end;
run;
685932
Calcite | Level 5

This worked! Thank you!!

PeterClemmensen
Tourmaline | Level 20

Anytime, glad to help 🙂

PeterClemmensen
Tourmaline | Level 20

I don't see an ID 4, 5 or 6 in your data?

 

Edit: I got it 🙂

novinosrin
Tourmaline | Level 20

data have;
input ID Q12 $20.;
datalines;
1 1,2,3,4,5
2 4,11,13
3 6,7,8,9,10,11
4 .
5 1,9
6 .
;
data want;
set have;
array t Q12_1-Q12_14 (14*0);
retain k;
if _n_ = 1 then k = peekclong(addrlong (t[1]), 112);
else  call pokelong(k, addrlong (t[1]), 112) ; 
do i=1 to countw(q12);
j=scan(q12,i);
t(input(j,8.))=1;
end;
if 1 not in t then call missing(of t(*));
drop k j i;
run;
685932
Calcite | Level 5

This also worked perfectly. Thank you for your help! 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1350 views
  • 2 likes
  • 4 in conversation