In my data, there's question like: "What would you often do at weekends?". There are a few available answers and respondents can choose multiple options. But the questionnaire design step wasn't great, so one response looks like this "1. Listen to music; 3. Read novels; 6. Hang-out with friends" while another is like "2. Watch movies; 3. Read novels; 4. Play games".
My question is whether there's a remedy which separates out this mixed answer and creates a set of dummy variables, so that the ouput looks like this:
ID Music Movie Novel Game Work Friend
01 1 0 1 0 0 1
02 0 1 1 1 0 0
Any thoughts are appreciated.
Split the values into multiple observations and transpose back .
data response;
id+1;
input response $80.;
cards4;
1. Listen to music; 3. Read novels; 6. Hang-out with friends
2. Watch movies; 3. Read novels; 4. Play games;
;;;;
data vertical ;
set response ;
item=response ;
dummy=1;
do _n_=1 by 1 until(item=' ');
item = scan(response,_n_,';');
if _n_=1 or item ne ' ' then output ;
end;
run;
proc transpose data=vertical out=want(drop=_name_) ;
by id ;
id item ;
var dummy;
run;
proc print; run;
Assuming that the "1" in "1. Listen to music" sufficiently defines Music=1, etc.:
Data Have;
Infile Datalines DSD;
Input Text :$200.;
Datalines4;
1. Listen to music; 3. Read novels; 6. Hang-out with friends
2. Watch movies; 3. Read novels; 4. Play games
;;;;;
Data Want (Drop=Text i);
Set Have;
ID=Put(_N_,Z2.);
Array Dummy_Vars Music Movie Novel Game Work Friend;
Do i=1 To Dim(Dummy_Vars);
Dummy_Vars=IfN(Find(Text,Put(i,1.)),1,0);
End;
Run;
Hi,
Whilst I agree with user24feb as to the solution to the set clause, I would also suggest the below scenario as it would be quite likely that you would have more categories than given, possibly not even knowing how many? If so, splitting the results and then transposing may be easier to maintain than editing a list of values in an array. The code below simply separates each value from the string into a new observation. You can apply certain names to these, for instance if index(new_str,"music")>0 then "Music"... and then transpose up. Or you could just assign them colx to avoid the whole "knowing" what the columns are:
data have;
str="1. Listen to music; 3. Read novels; 6. Hang-out with friends"; output;
str="2. Watch movies; 3. Read novels; 4. Play games;"; output;
run;
data inter (drop=i str);
set have;
i=1;
do while (scan(str,i,";") ne "");
id=put(_n_,z2.);
new_str=strip(scan(str,i,";"));
col_no="COL"||strip(put(substr(new_str,1,1),best.));
output;
i=i+1;
end;
run;
From this you can proc transpose by id, using col_no to identify columns.
Perhaps you can just search for the indicator variable names in the text of the response.
Split the values into multiple observations and transpose back .
data response;
id+1;
input response $80.;
cards4;
1. Listen to music; 3. Read novels; 6. Hang-out with friends
2. Watch movies; 3. Read novels; 4. Play games;
;;;;
data vertical ;
set response ;
item=response ;
dummy=1;
do _n_=1 by 1 until(item=' ');
item = scan(response,_n_,';');
if _n_=1 or item ne ' ' then output ;
end;
run;
proc transpose data=vertical out=want(drop=_name_) ;
by id ;
id item ;
var dummy;
run;
proc print; run;
Thanks for all the help. I've not tried every suggested solution, but attempted to figure out one on my own and thought that a simple and potential solution may be:
(1) Scan the response variable;
(2) If the response contains "1", then variable "Listen_to_music" returns the value of "1". Otherwise, "Listen_to_music" is "0"
(3) Use the similar code at step (2) for other categories.
Would you think it's an OK approach and what would you suggest for a program to do so?
As long as the labels for the responses do not contain code that looks like the numbers it should work.
data response;
id+1;
array items Listen Watch Read Play Number5 Hang_out ;
input response $80.;
do _n_=1 to dim(items);
items(_n_) = 0< indexw(response,cats(_n_,'.'));
end;
cards4;
1. Listen to music; 3. Read novels; 6. Hang-out with friends
2. Watch movies; 3. Read novels; 4. Play games;
;;;;
proc print;
id id ;
run;
H
N a
L u n
i W m g
s a R P b _
t t e l e o
i e c a a r u
d n h d y 5 t Response
1 0 0 0 0 0 0
2 1 0 1 0 0 1 1. Listen to music; 3. Read novels; 6. Hang-out with friends
3 0 1 1 1 0 0 2. Watch movies; 3. Read novels; 4. Play games;
I have a very similar dataset that I'm trying to adapt this code to and running into issues with the dummy responses not matching up. I have a file that I import where there is a store name (which I need to keep in the final data) and then a comma separated list of products (lets say there are 5 products for this example).
Store1, Bread, Eggs, Juice
Store2, Juice
Store3, Eggs, Juice
End result needs to be:
Store Bread Eggs Juice
Store1 1 1 1
Store2 0 0 1
Store3 0 1 1
I was able to get the code to run but once I looked at the dummy variables, they were not matching up.
data response;
input store :$20. v1 v2 v3 v4 v5;
run;
data indicator;
set response;
array items
do i = 1 to dim(items);
items = not not find(response,vname(items),'IT');
end;
drop i;
run;
proc print;
run;
Read the log, look at the data. Your input is not right.
That helps. I'm just learning this stuff so I am trying to learn on the fly. I'm much closer now as it seems to be picking up the first product consistently. My data actually is in the format of:
Store 1,bread,eggs,juice
Store 2,juice
So I just need to figure out the formatting since it doesn't like the spaces and commas. Thanks for the the help.
You probably are having trouble with the step that is reading the text into a SAS dataset.
To generate the two variable structure used in the solution you can mix list and formatted input modes.
Read the lines using comma as a delimiter, but read the RESPONSE variable using a INFORMAT so that it ignores the commas and just reads the values.
data response;
infile CARDS dsd dlm=',' truncover;
length store $20 response $60 ;
input store response $60.;
cards;
Store 1,Bread, Eggs, Juice
Store 2,Juice
Store 3,Eggs, Juice
;;;;
732 data _null_;
733 set response ;
734 file log dsd ;
735 put store response ;
736 run;
Store 1,"Bread, Eggs, Juice"
Store 2,Juice
Store 3,"Eggs, Juice"
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.