Solved: Re: Data Transformation: Reshape Wide to Long

renanm · Posted 06-20-2019 09:11 AM

Hello everyone,

I've been trying to do a transformation which I though would be quite simple. After trying most approaches shared in this forum I'm still not getting the expected results so looking for some help/guidance.

I have a dataset that looks like:

data have;
input id tr1 tr2 tr3 tr4;
cards;
1 AAA BBB GGG FFF
2 DDD CCC AAA BBB
3 BBB AAA JJJ VVV
;
run;

The only difference is my real data is that variable TRX goes from 1 to 125 (tr1....tr125).

I would like the data to look like:

data want;
input id tr;
cards;
1 AAA
1 BBB
1 GGG
1 FFF
2 DDD
2 CCC
2 AAA
2 BBB
3 BBB
3 AAA
3 JJJ
3 VVV
;
run;

In essence, instead of multiple columns I'd like multiple rows.

The following code seems to work in very similar cases but not working for this one:

proc transpose data=have out=have_trans;
   by ID;
   var tr1-tr4;
run;

This yield a dataset with the following columns:

ID _NAME_ COL1 COL2 COL3 COL4 COL5........

How can I make it so that it yields only ID _NAME_ COL1 with all values previous in multiple columns contained in COL1.

Thanks !

ballardw · Posted 06-20-2019 11:03 AM

Examine your data. I will bet that you find that you have duplicate values of one or more values of ID.

Example: ID=1 appears twice, so we get col1 and col2 , one column for each duplicate of id. So if your output includes COL25 then at least one value of ID is duplicated 25 times in the input set.

data have;
input id tr1 $ tr2 $ tr3 $ tr4 $;
cards;
1 AAA BBB GGG FFF
1 DDD CCC AAA BBB
3 BBB AAA JJJ VVV
;
run;

proc transpose data=have out=have_trans;
   by id;
   var tr1-tr4;
run;

View solution in original post

PeterClemmensen · Posted 06-20-2019 09:14 AM

Works fine?

data have;
input id tr1 $ tr2 $ tr3 $ tr4 $;
cards;
1 AAA BBB GGG FFF
2 DDD CCC AAA BBB
3 BBB AAA JJJ VVV
;
run;

proc transpose data=have out=have_trans(drop=_NAME_ rename=(COl1=tr));
   by ID;
   var tr1-tr4;
run;

The DATA to DATA Step Macro
Blog: SASnrd

renanm · Posted 06-20-2019 10:01 AM

Thanks for your response.

I had previously tried this, but unfortunately it doesn't work as expected even though it does work in this sample dataset.

Essentially, in my real data this returns the following structure:

ID TR COL2 COL3 COL4 COL5.....COL125

Not sure why the remaining columns continue to show up.

I need the final data to have only the following with all values contained in the variable TR:

ID TR

Thanks for the support.

Kurt_Bremser · Posted 06-20-2019 10:29 AM

Please post the log of the step that created this structure.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

renanm · Posted 06-20-2019 10:43 AM

interestingly, I I just re-ran the same code on a smaller dataset (for speed purposes) and it now returns only up to COL3:

ID TR COL2 COL3

@Kurt_Bremser, here's the log:

;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Tracking File';
4 %LET _CLIENTPROJECTPATH='';
5 %LET _CLIENTPROJECTNAME='';
6 %LET _SASPROGRAMFILE=;
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTIVEX;
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGSR TEMP;
12 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
13 STYLE=HtmlBlue
14 STYLESHEET=(URL="file:///C:/Program%20Files/SASHome2/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")
15 NOGTITLE
16 NOGFOOTNOTE
17 GPATH=&sasworklocation
18 ENCODING=UTF8
19 options(rolap="on")
20 ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
21
22 GOPTIONS ACCESSIBLE;
23 proc transpose data=have out=want_trans (drop=_NAME_ rename=(COl1=tr));
24 by ID;
25 var TR0-TR125;
26 run;

NOTE: There were 15824 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT_TRANS has 276423 observations and 4 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.10 seconds
cpu time 0.09 seconds

27
28 GOPTIONS NOACCESSIBLE;
29 %LET _CLIENTTASKLABEL=;
30 %LET _CLIENTPROJECTPATH=;
31 %LET _CLIENTPROJECTNAME=;
32 %LET _SASPROGRAMFILE=;
33
34 ;*';*";*/;quit;run;
35 ODS _ALL_ CLOSE;
36
37
38 QUIT; RUN;
39

ballardw · Posted 06-20-2019 11:03 AM

Examine your data. I will bet that you find that you have duplicate values of one or more values of ID.

Example: ID=1 appears twice, so we get col1 and col2 , one column for each duplicate of id. So if your output includes COL25 then at least one value of ID is duplicated 25 times in the input set.

data have;
input id tr1 $ tr2 $ tr3 $ tr4 $;
cards;
1 AAA BBB GGG FFF
1 DDD CCC AAA BBB
3 BBB AAA JJJ VVV
;
run;

proc transpose data=have out=have_trans;
   by id;
   var tr1-tr4;
run;

renanm · Posted 06-20-2019 12:16 PM

@ballardw thanks for taking the time to help.

You're absolutely right. After deduping the file it worked as expected.

Glad to learn duplicates should be eliminated before running this procedure.

Thanks again !

ballardw · Posted 06-20-2019 05:32 PM

@renanm wrote:

@ballardw thanks for taking the time to help.

You're absolutely right. After deduping the file it worked as expected.

Glad to learn duplicates should be eliminated before running this procedure.

Thanks again !

Knowing the cause is helpful. Often you may actually want multiple columns like that for some purposes.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away