trasform/rotate datasets using arrays

dr2014 · Posted 08-27-2015 11:32 AM

Hi , I am looking to rotate a SAS dataset (temp3) ,basically converting a tall dataset to a fat dataset. The dataset 'represents answers to each option within a question'. The original ids for the questions and its options were not labeled in a sequential/logical manner. I have new labels for each of the 'options' within a questionid using certain criteria represented by the variable name new_optionid. Copy-pasted below is how the dataset temp3 looks after sorting and stacking datasets by sequential order of questionid and optionid.

ID	questionid	optionid	answer	Name	Name_of_unit	sortid	new_questionid	new_optionid	new_sortid
267899	26	837	1	Alaska	xyz	1	1	Q0101C	1
378990	27	467	0	Alaska	xyz	1	2	Q0201C	2
378991	27	519	0	Alaska	xyz	2	2	Q0202C	3
207320	27	520	0	Alaska	xyz	3	2	Q0203C	4
207330	27	521	1	Alaska	xyz	4	2	Q0204C	5
207340	27	522	1	Alaska	xyz	5	2	Q0205C	6
207350	27	523	1	Alaska	xyz	6	2	Q0206C	7
207360	58	170	1	Alaska	xyz	7	3	Q0401C	8

Questionid=26 and its optionid=837 corresponds to new_optionid Q0101C...the first 2 digits '01' in 'Q0101C' corresponds to questionid=26 and next 2 digits '01' in 'Q0101C' corresponds to optionid=837

Questionid=27 and its optionid=467 corresponds to new_optionid Q0201C...the first 2 digits '02' in 'Q0201C' corresponds to questionid=27 and next 2 digits '01' in 'Q0201C' corresponds to optionid=467..and so on

Since the dataset only houses non-missing answers, the new_optionid only contains only those answered ids . I also created a numeric field called new_sortid to use arrays with numeric data only. So, new_optionid='Q0101C' corresponds to new_sortid=1 and so on.

I need to have a final dataset that looks like this, where the responses in the answer column are below each new_optionid:

Name	Name_of_unit	Q0101C	Q0201C	Q0202C	Q0203C	Q0204C	Q0205C	Q0206C	Q0401C
Alaska	xyz	1	0	0	0	1	1	1	1

I used an array (below is the code I began with)...but its not giving me what I need..could I use an array without creating a new_sortid. How can I incorporate the character value of new_optionid as column names? Please advice.

data temp4;

array s(9) s1-s9;

retain s1-s9;

set temp3;

by new_questionid;

s(new_sortid)=answer;

if last.new_questionid then output;

keep new_questionid s1-s9 name name_of_unit ;

run;

the above code gave me (which is making it more complicated).

s1	s2	s3	s4	s5	s6	s7..etc	new_questionid
1	.	.	.	.	.	.	1
1	0	0	0	1	1	1	2
1	0	0	0	1	1	1	3

ballardw · Posted 08-27-2015 12:47 PM

I'm not sure what your explanation with questionid and optionid other than having something to do with recapturing the original question 'names'. I suspect that logic to assign that value would be the first part but I don't get the "digits" as you give no idea what questionid 27 and optionis 519 through 522 mean.

And since Questionid 58 has an option id 170 what does that relate to?

Is the "meaning" of Q0201C something like Question 2, first category selected(if 1, not selected if 0?) and Q0202C Question 2, Second category (like Question 2 is a "mark all that apply" question)?

Do any of the OPTIONID values appear associated with more than one QUESTIONID? If not then it may be that questionid is redundant.

You don't show any relationship between the example raw data and the S variables in the example code. So no clue as why it isn't working or even what it may actually do.

And what happens to the first column just labeled ID?

If we can get something that looks close to that

dr2014 · Posted 08-27-2015 01:32 PM

@Ballardw..all right , I see what you mean. Its hard to explain as a lot of data management has been done to arrive at the dataset (temp3) . I probably do not need to go through all those explanations.

Let me just restructure my concern. Here is data temp3. ID in it stands for each individual row/record in the data for a response(represented by the variable 'answer') to a new_optionid.

temp3

ID	answer	Name	Name_of_unit	new_optionid
267899	1	Alaska	xyz	Q0101C
378990	0	Alaska	xyz	Q0201C
378991	0	Alaska	xyz	Q0202C
207320	0	Alaska	xyz	Q0203C
207330	1	Alaska	xyz	Q0204C
207340	1	Alaska	xyz	Q0205C
207350	1	Alaska	xyz	Q0206C
207360	1	Alaska	xyz	Q0401C

I need to transform temp3 to temp4:

temp4

Name	Name_of_unit	Q0101C	Q0201C	Q0202C	Q0203C	Q0204C	Q0205C	Q0206C	Q0401C
Alaska	xyz	1	0	0	0	1	1	1	1

The order of new_optionid from top to bottom in temp3 needs to be retained from 'left to right' in temp4. Is it possible to do that transformation in one go?

If this version of my concern is not appropriate, then please let me know.

P.S: new_optionid corresponds to items/options within a question which have a response of '1' or '0' or '9'.

e.g Question 01: Do you have fire drill protocol in your unit?

Option 01: *yes *no *dont know.........................................Q0101C (the first '01' in 'Q0101C' from the left corresponds to 'question 01' and the

second 01 after it corresponds to 'option 01' in 'question 01')

A response of 'yes' translates to '1' in the variable 'answer' and,

A response of 'no' translates to '0' and,

A response of 'dont know' translates to '9'

Reeza · Posted 08-27-2015 01:47 PM

Isn't that a straightforward proc transpose? Have you tried a proc transpose that didn't work?

dr2014 · Posted 08-27-2015 01:55 PM

@reeza...I didn't use it. Does it work well when there are missing values in the data? It had messed up one of my tables in the past so was avoiding it.

Reeza · Posted 08-27-2015 03:07 PM

Define missing values. Is that a missing one across all observations? If there is a value for one observation then it will be included for all.

dr2014 · Posted 08-27-2015 04:11 PM

@Reeza. Thanks for sharing that. Now for this data, once I sort by new_optionid, the first observation for the 'answer' will have '0' value (which stands for 'no') , and the rest of the questions will have a missing value( represented by a period). The best way to avoid any problems will be to recode missing values with a numeric value e.g. '999'

dr2014 · Posted 08-27-2015 02:47 PM

All right. I will use proc transpose then...

samirt · Posted 08-31-2015 03:22 AM

proc transpose data=tr out=trans_tr;

by Name Name_of_unit;

var answer;

id new_optionid;

run;

proc print data=trans_tr;

run;

jljones · Posted 07-18-2017 12:23 PM

proc transpose data=tr out=trans_tr;

by Name Name_of_unit;

var answer;

id new_optionid;

run;

proc print data=trans_tr;

run;

This is an old post but the last code was very helpful in transforming my data too (similar layout). Only problem is that I have duplicate persons on my by statement. I used the "let" in the procedure to test the code but I do not want the last observation. I want all of them in the wide format with numbers behind them (ethnic01 ethnic02, etc). Not sure how the code works to add this.

Thanks in advance! J

trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Re: trasform/rotate datasets using arrays

Registration is open