Hi SAS users,
I have a table as below:
Variable1 Variable2 Variable3
1 a b
1 c d
and I would like to expand my columns (creating new variables) for each unique Variable1 as below:
Variable1 Variable2 Variable3 Variable4 Variable5
1 a b c d
I have milions of rows (with unique variable1). Could you please help with it?
Let me put it this way (another example):
Variable1 Variable2 Variable3
1 W 3.2
1 X 5.1
2 X 7.0
3 E 8.9
3 W 1.3
Whenever, the Variable2 corresponding to Variable1 (ID) is changing its value, we would like to expand it to new columns. In fact per each new category (based on Variable2), two new columns are added to the right:
Variable1 Variable2 Variable3 Variable4 Variable5
1 W 3.2 X 5.1
2 X 7.0 null null
3 E 8.9 W 1.3
PURPOSE: At the end, we are interested to have a unique Variable1 in first column; and the corresponding category(ies) are expanded to the right instead of stacking on top of eachother.
I hope its more clear now. Thank you for help
Well, two questions.
The first question is why. What is it your attempting to do which requires a transposed dataset. If you think about it you have millions of rows, which could mean up to 3 * millions of columns. Totally unworkable. SAS procedures, and SQL, allow what is called by group processing or grouped functions. These are easier to program, and faster.
The second question is less relevant, as from point 1 I don't believe transposing is a good idea. But if you were to do that, what relationship is there in the data to say that a should be in variable2, and c in variable4 - I mean if the data is sorted differently, they could swap over.
If it was my data I would actually normalise that data further - variable 1 appears to be some sort of ID, and will assume for now there is no relation between VAR2-VAR3:
ID VAR
1 A
1 B
1 C
1 D
...
Its very easy to group the data up in this structure.
Yes, but you haven't answered the question, why? What is the purpose of having a transposed dataset? In you example, you have only 2 elements being transposed, however you say your data has millions of records, what are you intending to do with - assuming it even runs with this many - a dataset with possible hundreds of columns?
Unless there is a very specific reason to have hundreds of columns, the dataset you have now is far easier to work with. Take your example dataset. Say I want to get the mean of all W variables. How are you going to do this in your transposed dataset? You would need to setup arrays, scan over the arrays, check if element is W, then add element+1 value to a counter etc. Far more code than simply:
proc means data=have;
where variable2="W";
output out=want mean=mean;
run;
You see how simpler the code is to work with, and it will be far quicker than array processing. The only time I can see transposed data being beneficial is in an output file for people to review, however if you have hundreds of columns, is it really going to get reviewed?
Although there are millions of rows, it is not going to generate hundreds of columns. Because, the number of Variable2 values (categories) are limited; it is probably about 10 . So it means that I am going to have 9*2 = 18 new expanded columns.
Again, the purpose is to use this file to merge with another dataset. It makes it easier to review the databse once I have only unique Variable1 values in 1st column.
Thank you
Hi @almmotamedi,
You can find no less than four different brilliant techniques for this task in this very recent thread:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.