I have a data set that I need to reorganize into columns and I'm worried there will be too much risk for human error if I try to hard code them all. Data set example:
ID | QUESTN_CD | QUESTN_TXT | QUESTN_ANSWER_TXT |
123 | 1 | Gender | M |
123 | 2 | Age | 36 |
123 | 3 | Fulltime | Y |
123 | 4 | Hourly Pay | N |
My desired output is something like this:
ID | GENDER | AGE | FULLTIME | HOURLY PAY |
123 | M | 36 | Y | N |
Code example with dummy data:
PROC SQL;
CREATE TABLE QUESTNS AS
SELECT DISTINCT t1.*,
t2.QUESTN_ANSWR_TXT AS 'Contract Signed'n,
t3.QUESTN_ANSWR_TXT AS 'Assignment Start Date'n,
t4.QUESTN_ANSWR_TXT AS 'Correct W4'n,
t5.QUESTN_ANSWR_TXT AS 'Name Spelled Properly'n,
t6.QUESTN_ANSWR_TXT AS 'Alterations to Assignment'n,
t7.QUESTN_ANSWR_TXT AS 'Assignment Type'n
FROM WORK. PERSON_INFO t1
left join QUESTN t2 on t1.ID = t2.REF_ID
left join QUESTN t3 on t1.ID = t3.REF_ID
left join QUESTN t4 on t1.ID = t4.REF_ID
left join QUESTN t5 on t1.ID = t5.REF_ID
left join QUESTN t6 on t1.ID = t6.REF_ID
left join QUESTN t7 on t1.ID = t7.REF_ID
WHERE (t2.QUESTN_CD = 9 AND t3.QUESTN_CD = 23 AND t4.QUESTN_CD = 8 AND t5.QUESTN_CD = 29
AND t6.QUESTN_CD = 37 AND t7.QUESTN_CD = 22);
QUIT;
But I think there's too much room for human error in manually renaming each column based on the question code, given that there are over 40 question codes (for now) and the data source may update and change without my knowledge.
Any and all advice appreciated!
data have;
infile cards truncover;
input ID QUESTN_CD QUESTN_TXT & $15. QUESTN_ANSWER_TXT $;
cards;
123 1 Gender M
123 2 Age 36
123 3 Fulltime Y
123 4 Hourly Pay N
;
proc transpose data=have out=want(drop=_:);
by id;
var QUESTN_ANSWER_TXT;
id QUESTN_TXT;
run;
Look up PROC TRANSPOSE instead.
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
@S_Nav wrote:
I have a data set that I need to reorganize into columns and I'm worried there will be too much risk for human error if I try to hard code them all. Data set example:
ID QUESTN_CD QUESTN_TXT QUESTN_ANSWER_TXT 123 1 Gender M 123 2 Age 36 123 3 Fulltime Y 123 4 Hourly Pay N
My desired output is something like this:
ID GENDER AGE FULLTIME HOURLY PAY 123 M 36 Y N
Code example with dummy data:
PROC SQL; CREATE TABLE QUESTNS AS SELECT DISTINCT t1.*, t2.QUESTN_ANSWR_TXT AS 'Contract Signed'n, t3.QUESTN_ANSWR_TXT AS 'Assignment Start Date'n, t4.QUESTN_ANSWR_TXT AS 'Correct W4'n, t5.QUESTN_ANSWR_TXT AS 'Name Spelled Properly'n, t6.QUESTN_ANSWR_TXT AS 'Alterations to Assignment'n, t7.QUESTN_ANSWR_TXT AS 'Assignment Type'n FROM WORK. PERSON_INFO t1 left join QUESTN t2 on t1.ID = t2.REF_ID left join QUESTN t3 on t1.ID = t3.REF_ID left join QUESTN t4 on t1.ID = t4.REF_ID left join QUESTN t5 on t1.ID = t5.REF_ID left join QUESTN t6 on t1.ID = t6.REF_ID left join QUESTN t7 on t1.ID = t7.REF_ID WHERE (t2.QUESTN_CD = 9 AND t3.QUESTN_CD = 23 AND t4.QUESTN_CD = 8 AND t5.QUESTN_CD = 29 AND t6.QUESTN_CD = 37 AND t7.QUESTN_CD = 22); QUIT;
But I think there's too much room for human error in manually renaming each column based on the question code, given that there are over 40 question codes (for now) and the data source may update and change without my knowledge.
Any and all advice appreciated!
data have;
infile cards truncover;
input ID QUESTN_CD QUESTN_TXT & $15. QUESTN_ANSWER_TXT $;
cards;
123 1 Gender M
123 2 Age 36
123 3 Fulltime Y
123 4 Hourly Pay N
;
proc transpose data=have out=want(drop=_:);
by id;
var QUESTN_ANSWER_TXT;
id QUESTN_TXT;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.