I am trying to perform a transpose from wide to long. There are 27 variables that I am transposing. The data looks essentially like this:
id var1 var2 var3 ... var27
1 a b c
In the past when I have done this transpose, the result looks like this:
id _name_ COL1
1 var1 a
1 var2 b
1 var3 c
...
1 var27
However, today the result I am getting looks like this:
id _name_ COL1 COL2 COL3 ... COL25
1 var1 a a a
1 var2 b b b
1 var3 c c c
...
1 var27
This is the code I used:
proc transpose data = data out = data_long ;
by id;
var var1 - var27;
run;
What am I doing wrong here? Why and I getting so many columns?
My variables are all consecutive in the data set and all are the same format- $8.
Using SAS 9.4 on Windows
The BY variable values are not unique. The number of variables ("columns") created is equal to the largest number of observations ("rows") for any single BY group.
That makes sense Tom, thank you. So removing duplicate IDs should solve this then right?
@miriamhartig wrote:
That makes sense Tom, thank you. So removing duplicate IDs should solve this then right?
Yes, no and maybe.
The issue may then become which of the value set you want for the original combination of values for var1 - varn. If all of the Id have the same var1 - varn then selection of which observation (unique id value) to keep and transform is easy.
But what if, for some reason we can't tell at this point you want the var1 value from the first record with Id=1 and the var2 value from the 4th record with Id=1 (and so on for other vars)? Then how to get a single ID for transpose may be a bit tricky.
And if you really need all the values perhaps a manual transpose is in order:
data example; input id var1 $ var2 $ var3$; datalines; 1 a b c 2 b c d 3 c d e 1 d e c 2 p d q ; run; data manualtrans (keep= id _name_ value); set example; array v var1-var3; length _name_ $10.; do i = 1 to dim(v); _name_ = vname(v[i]); value = v[i]; output; end; run;
@miriamhartig wrote:
That makes sense Tom, thank you. So removing duplicate IDs should solve this then right?
Or adding another variable (or more) to the BY statement so that each row is identified uniquely.
data fix;
set have ;
by idvar ;
if first.idvar then row_num=0;
row_num+1;
run;
proc transpose data=fix .... ;
by idvar row_num;
....
run;
Yes, it has at least helped me.
Thank you.
- Dr. Abhijeet Safai
Thank you. Still my problem is not solved on which I am working, but this answer has given me a good hint. Thanks once again.
- Dr. Abhijeet Safai
Now I understood it. Yes, you rightly pointed out that by variable is not unique. It helped me to solve my issue. Thanks once again.
- Dr. Abhijeet Safai
I would like to post it again that the solution given by @Tom is wonderful. He said that the records are not unique and that gave me hint and I was able to find out a variable which was not previously included in the by statement of proc transpose and this problem was solved. I am posting here again because again I faced the same problem and it got solved by addition of a variable in by statement which was not added previously. So the solution to this kind of problem is mostly - to find out the variable which is not added in by statement of proc transpose and to add it.
Thank you.
- Dr. Abhijeet Safai
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.