SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

trasform/rotate datasets using arrays

Reply
Frequent Contributor
Posts: 124

trasform/rotate datasets using arrays

Hi , I am looking to rotate a SAS dataset (temp3) ,basically converting a tall dataset to a fat dataset. The dataset 'represents answers to each option within a question'. The original ids for the questions and its options were not labeled in a sequential/logical  manner. I have new labels for each of the 'options' within a questionid using certain criteria represented by the variable name new_optionid. Copy-pasted below is how the dataset temp3 looks after sorting and stacking datasets by sequential order of questionid and optionid.

IDquestionidoptionidanswerNameName_of_unitsortidnew_questionidnew_optionidnew_sortid
267899268371Alaskaxyz11Q0101C1
378990274670Alaskaxyz12Q0201C2
378991275190Alaskaxyz22Q0202C3
207320275200Alaskaxyz32Q0203C4
207330275211Alaskaxyz42Q0204C5
207340275221Alaskaxyz52Q0205C6
207350275231Alaskaxyz62Q0206C7
207360581701Alaskaxyz73Q0401C8

Questionid=26 and its optionid=837 corresponds to new_optionid Q0101C...the first 2 digits '01' in 'Q0101C' corresponds to questionid=26 and next 2 digits '01' in 'Q0101C' corresponds to optionid=837

Questionid=27 and its optionid=467 corresponds to new_optionid Q0201C...the first 2 digits '02' in 'Q0201C' corresponds to questionid=27 and next 2 digits '01' in 'Q0201C' corresponds to optionid=467..and so on

Since the dataset only houses non-missing answers, the new_optionid only contains only those answered ids . I also created a numeric field called new_sortid to use arrays with numeric data only. So, new_optionid='Q0101C' corresponds to new_sortid=1 and so on.

I need to have a final dataset that looks like this, where the responses in the answer column are below each new_optionid:

NameName_of_unitQ0101CQ0201CQ0202CQ0203CQ0204CQ0205CQ0206CQ0401C
Alaskaxyz10001111

I used an array (below is the code I began with)...but its not giving me what I need..could I use an array without creating a new_sortid. How can I incorporate the character value of new_optionid as column names? Please advice.

data temp4;

array s(9) s1-s9;

retain s1-s9;

set temp3;

by new_questionid;

s(new_sortid)=answer;

if last.new_questionid then output;

keep new_questionid s1-s9 name name_of_unit ;

run;

the above code gave me (which is making it more complicated).

s1s2s3s4s5s6s7..etcnew_questionid
1......1
10001112
10001113
Super User
Posts: 10,500

Re: trasform/rotate datasets using arrays

I'm not sure what your explanation with questionid and optionid other than having something to do with recapturing the original question 'names'. I suspect that logic to assign that value would be the first part but I don't get the "digits" as you give no idea what questionid 27 and optionis 519 through 522 mean.

And since Questionid 58 has an option id 170 what does that relate to?

Is the "meaning" of Q0201C something like Question 2, first category selected(if 1, not selected if 0?) and Q0202C Question 2, Second category (like Question 2 is a "mark all that apply" question)?

Do any of the OPTIONID values appear associated with more than one QUESTIONID? If not then it may be that questionid is redundant.

You don't show any relationship between the example raw data and the S variables in the example code. So no clue as why it isn't working or even what it may actually do.

And what happens to the first column just labeled ID?

If we can get something that looks close to that

Frequent Contributor
Posts: 124

Re: trasform/rotate datasets using arrays

@Ballardw..all right , I see what you mean. Its hard to explain as a lot of data management has been done to arrive at the dataset (temp3) . I probably do not need to go through all those explanations.

Let me just restructure my concern. Here is data temp3.  ID in it stands for each individual row/record in the data for a response(represented by the variable 'answer')  to a new_optionid.

temp3

IDanswerNameName_of_unitnew_optionid
2678991AlaskaxyzQ0101C
3789900AlaskaxyzQ0201C
3789910AlaskaxyzQ0202C
2073200AlaskaxyzQ0203C
2073301AlaskaxyzQ0204C
2073401AlaskaxyzQ0205C
2073501AlaskaxyzQ0206C
2073601AlaskaxyzQ0401C

I need to transform temp3 to temp4:

temp4

NameName_of_unitQ0101CQ0201CQ0202CQ0203CQ0204CQ0205CQ0206CQ0401C
Alaskaxyz10001111

The order of new_optionid  from top to bottom in temp3 needs to be retained from 'left to right' in temp4. Is it possible to do that transformation in one go?

If this version of my concern is not appropriate, then please let me know.

P.S: new_optionid corresponds to items/options within a question which have a response of '1' or '0' or '9'.

e.g    Question 01: Do you have fire drill protocol in your unit?

                            Option 01: *yes    *no   *dont know.........................................Q0101C (the first '01' in 'Q0101C' from the left corresponds to 'question 01' and the

                                                                                                                                    second 01 after it corresponds to 'option 01' in 'question 01')

                            A response of 'yes' translates to '1' in the variable 'answer' and,

                            A response of 'no' translates to '0' and,

                            A response of 'dont know' translates to '9'

Super User
Posts: 17,840

Re: trasform/rotate datasets using arrays

Isn't that a straightforward proc transpose? Have you tried a proc transpose that didn't work?

Frequent Contributor
Posts: 124

Re: trasform/rotate datasets using arrays

@reeza...I didn't use it. Does it work well when there are missing values in the data? It had messed up one of my tables in the past so was avoiding it.

Super User
Posts: 17,840

Re: trasform/rotate datasets using arrays

Define missing values. Is that a missing one across all observations? If there is a value for one observation then it will be included for all.

Frequent Contributor
Posts: 124

Re: trasform/rotate datasets using arrays

@Reeza. Thanks for sharing that. Now for this data, once I sort  by new_optionid, the first observation for the 'answer' will have '0' value (which stands for 'no') , and the rest of the questions will have a missing value( represented by a period). The best way to avoid any problems will be to recode missing values with a numeric value e.g. '999'

Frequent Contributor
Posts: 124

Re: trasform/rotate datasets using arrays

All right. I will use proc transpose then...

Occasional Contributor
Posts: 15

Re: trasform/rotate datasets using arrays

proc transpose data=tr out=trans_tr;

by Name Name_of_unit;

    var answer;

    id new_optionid;

run;

proc print data=trans_tr;

run;

New Contributor
Posts: 3

Re: trasform/rotate datasets using arrays

proc transpose data=tr out=trans_tr;

by Name Name_of_unit;

    var answer;

    id new_optionid;

run;

 

proc print data=trans_tr;

run;

 

This is an old post but the last code was very helpful in transforming my data too (similar layout). Only problem is that I have duplicate persons on my by statement. I used the "let" in the procedure to test the code but I do not want the last observation. I want all of them in the wide format with numbers behind them (ethnic01 ethnic02, etc). Not sure how the code works to add this. 

 

Thanks in advance! J

Ask a Question
Discussion stats
  • 9 replies
  • 589 views
  • 7 likes
  • 5 in conversation