Hi , I am looking to rotate a SAS dataset (temp3) ,basically converting a tall dataset to a fat dataset. The dataset 'represents answers to each option within a question'. The original ids for the questions and its options were not labeled in a sequential/logical manner. I have new labels for each of the 'options' within a questionid using certain criteria represented by the variable name new_optionid. Copy-pasted below is how the dataset temp3 looks after sorting and stacking datasets by sequential order of questionid and optionid.
ID | questionid | optionid | answer | Name | Name_of_unit | sortid | new_questionid | new_optionid | new_sortid |
267899 | 26 | 837 | 1 | Alaska | xyz | 1 | 1 | Q0101C | 1 |
378990 | 27 | 467 | 0 | Alaska | xyz | 1 | 2 | Q0201C | 2 |
378991 | 27 | 519 | 0 | Alaska | xyz | 2 | 2 | Q0202C | 3 |
207320 | 27 | 520 | 0 | Alaska | xyz | 3 | 2 | Q0203C | 4 |
207330 | 27 | 521 | 1 | Alaska | xyz | 4 | 2 | Q0204C | 5 |
207340 | 27 | 522 | 1 | Alaska | xyz | 5 | 2 | Q0205C | 6 |
207350 | 27 | 523 | 1 | Alaska | xyz | 6 | 2 | Q0206C | 7 |
207360 | 58 | 170 | 1 | Alaska | xyz | 7 | 3 | Q0401C | 8 |
Questionid=26 and its optionid=837 corresponds to new_optionid Q0101C...the first 2 digits '01' in 'Q0101C' corresponds to questionid=26 and next 2 digits '01' in 'Q0101C' corresponds to optionid=837
Questionid=27 and its optionid=467 corresponds to new_optionid Q0201C...the first 2 digits '02' in 'Q0201C' corresponds to questionid=27 and next 2 digits '01' in 'Q0201C' corresponds to optionid=467..and so on
Since the dataset only houses non-missing answers, the new_optionid only contains only those answered ids . I also created a numeric field called new_sortid to use arrays with numeric data only. So, new_optionid='Q0101C' corresponds to new_sortid=1 and so on.
I need to have a final dataset that looks like this, where the responses in the answer column are below each new_optionid:
Name | Name_of_unit | Q0101C | Q0201C | Q0202C | Q0203C | Q0204C | Q0205C | Q0206C | Q0401C |
Alaska | xyz | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
I used an array (below is the code I began with)...but its not giving me what I need..could I use an array without creating a new_sortid. How can I incorporate the character value of new_optionid as column names? Please advice.
data temp4;
array s(9) s1-s9;
retain s1-s9;
set temp3;
by new_questionid;
s(new_sortid)=answer;
if last.new_questionid then output;
keep new_questionid s1-s9 name name_of_unit ;
run;
the above code gave me (which is making it more complicated).
s1 | s2 | s3 | s4 | s5 | s6 | s7..etc | new_questionid |
1 | . | . | . | . | . | . | 1 |
1 | 0 | 0 | 0 | 1 | 1 | 1 | 2 |
1 | 0 | 0 | 0 | 1 | 1 | 1 | 3 |
I'm not sure what your explanation with questionid and optionid other than having something to do with recapturing the original question 'names'. I suspect that logic to assign that value would be the first part but I don't get the "digits" as you give no idea what questionid 27 and optionis 519 through 522 mean.
And since Questionid 58 has an option id 170 what does that relate to?
Is the "meaning" of Q0201C something like Question 2, first category selected(if 1, not selected if 0?) and Q0202C Question 2, Second category (like Question 2 is a "mark all that apply" question)?
Do any of the OPTIONID values appear associated with more than one QUESTIONID? If not then it may be that questionid is redundant.
You don't show any relationship between the example raw data and the S variables in the example code. So no clue as why it isn't working or even what it may actually do.
And what happens to the first column just labeled ID?
If we can get something that looks close to that
@Ballardw..all right , I see what you mean. Its hard to explain as a lot of data management has been done to arrive at the dataset (temp3) . I probably do not need to go through all those explanations.
Let me just restructure my concern. Here is data temp3. ID in it stands for each individual row/record in the data for a response(represented by the variable 'answer') to a new_optionid.
temp3
ID | answer | Name | Name_of_unit | new_optionid |
---|---|---|---|---|
267899 | 1 | Alaska | xyz | Q0101C |
378990 | 0 | Alaska | xyz | Q0201C |
378991 | 0 | Alaska | xyz | Q0202C |
207320 | 0 | Alaska | xyz | Q0203C |
207330 | 1 | Alaska | xyz | Q0204C |
207340 | 1 | Alaska | xyz | Q0205C |
207350 | 1 | Alaska | xyz | Q0206C |
207360 | 1 | Alaska | xyz | Q0401C |
I need to transform temp3 to temp4:
temp4
Name | Name_of_unit | Q0101C | Q0201C | Q0202C | Q0203C | Q0204C | Q0205C | Q0206C | Q0401C |
---|---|---|---|---|---|---|---|---|---|
Alaska | xyz | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
The order of new_optionid from top to bottom in temp3 needs to be retained from 'left to right' in temp4. Is it possible to do that transformation in one go?
If this version of my concern is not appropriate, then please let me know.
P.S: new_optionid corresponds to items/options within a question which have a response of '1' or '0' or '9'.
e.g Question 01: Do you have fire drill protocol in your unit?
Option 01: *yes *no *dont know.........................................Q0101C (the first '01' in 'Q0101C' from the left corresponds to 'question 01' and the
second 01 after it corresponds to 'option 01' in 'question 01')
A response of 'yes' translates to '1' in the variable 'answer' and,
A response of 'no' translates to '0' and,
A response of 'dont know' translates to '9'
Isn't that a straightforward proc transpose? Have you tried a proc transpose that didn't work?
@reeza...I didn't use it. Does it work well when there are missing values in the data? It had messed up one of my tables in the past so was avoiding it.
Define missing values. Is that a missing one across all observations? If there is a value for one observation then it will be included for all.
@Reeza. Thanks for sharing that. Now for this data, once I sort by new_optionid, the first observation for the 'answer' will have '0' value (which stands for 'no') , and the rest of the questions will have a missing value( represented by a period). The best way to avoid any problems will be to recode missing values with a numeric value e.g. '999'
All right. I will use proc transpose then...
proc transpose data=tr out=trans_tr;
by Name Name_of_unit;
var answer;
id new_optionid;
run;
proc print data=trans_tr;
run;
proc transpose data=tr out=trans_tr;
by Name Name_of_unit;
var answer;
id new_optionid;
run;
proc print data=trans_tr;
run;
This is an old post but the last code was very helpful in transforming my data too (similar layout). Only problem is that I have duplicate persons on my by statement. I used the "let" in the procedure to test the code but I do not want the last observation. I want all of them in the wide format with numbers behind them (ethnic01 ethnic02, etc). Not sure how the code works to add this.
Thanks in advance! J
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.