I am trying to transpose data in SAS from a long format to a wide format. The problem I'm having is that I have multiple columns that I'm trying to transpose. I have a few example datasets below to demonstrate what I'm trying to do. The actual dataset I'm doing this on is going to be very large, I think one way to handle this could be to tranpose individual columns and then merge at the end, but the dataset I'm going to be doing this on is going to be significantly larger (tens of thousands of columns), so this will be pretty unfeasible.
Below is the data format I'm starting with:
data current_state; input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars; datalines; 1 d234 d463 d213 d678 d435 50 2 d213 d690 d360 d145 d269 25 3 d409 d231 d463 d690 d609 10 ;
Here is my desired data format:
data desired_state; input id $ d145 $ d213 $ d231 $ d234 $ d269 $ d360 $ d409 $ d435 $ d463 $ d609 $ d678 $ d690; datalines; 1 0 50 0 50 0 0 0 50 0 0 50 0 2 25 25 0 0 25 25 0 0 0 0 0 25 3 0 0 10 0 0 0 10 0 10 10 0 10 ;
Is there a concise way to do this that I'm missing?
I would accomplish the task using two macros that I and others wrote and presented at SAS Global Forums, namely:
data current_state;
input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;
filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=current_state, out=long,
by=id, var=att, id=na, delimiter=_, copy=Dollars)
filename tr url 'http://tiny.cc/transpose_macro';
%include tr ;
%transpose(data=long, out=want, by=id, var=Dollars, id=att,
sort=yes, use_varname=no)
Art, CEO, AnalystFinder.com
It might help to 1) provide code used to attempt the transpose and 2) describe exactly what is not working, better to provide some example data that demonstrates what does not work along with what the desired result is.
Also any process that involves creating "tens of thousands of columns" is sort of suspect. What are you going to do with that wide data set?
So I do have what the desired output would be in the question, I have tried multiple solutions but here is one:
proc transpose data=current_state out=test1; by id; id att_1 att_2 att_3 att_4 att_5; var Dollars; run;
Essentially the part I'm having trouble with is getting unique column values and distributing the dollar amounts to the correct columns since the values for att_1-att_5 can be the same.
As far as what I'm doing with it, I'm going to be trying to do some nearest neighbor analysis.
I will start by saying its really not a good idea. You are multiplying up the data (the 50 for instance appears 4 times rather than once), so your data will be larger = harder to programm with and take more resources.
As for how you do it, well its going to be a bit more complicated than a proc transpose, you first need to get the data into a method for transposing. So from this:
1 d234 d463 d213 d678 d435 50
To:
1 d234 50
1 d463 50
1 d213 50
1 d678 50
1 d435 50
You can do this in a datastep:
data inter (keep=id vn dollars); length vn $200; set current_state; array v{*} att_; do i=1 to dim(v); vn=v{i}; output; end; run; proc transpose data=inter out=want; by id; var dollars; id vn; idlabel vn; run;
Note you would then need to add zeros for the missings etc. So a whole chunk of work to get data which is far bigger in size and less than optimal to work with.
Thank you! My next task is to figure out how to save it as a sparse matrix so it isn't quite so memory intensive. If you have any suggestions for me there I'd be happy to hear them... Or I might end up posting another question later on if I can't figure it out.
As I probably mentioned above, I wouldn't. I would keep it in long format. Generally most procedures are fine with long data, so matrices can be avoid.
Are you familiar with KNN and related analyses? I'm unaware how to do those types of operations on a long formatted data set but I'm all ears if SAS has that functionality.
So I apologize but I just realized I'm having a slight issue with the code you provided... The first step only generates a dataset three rows long, and it has none of the attributes listed out in the vn column. Am I missing something? Does it generate a long dataset for you?
Sorry, you have lost me. The step I provided takes the wide data, into a long format, so vn takes the variable name into a column. This is then used in the transpose back up again. What attributes do you mean, label? If so you need to extract that also simply grab the label with vlabel() function before the point of output, and save that to a variable as well. Then when transposing you can use idlabel <variable you created with label in>;
No I do not know "KNN and related analyses", and I am aware that some certain analysis do require data in a set format, so can't really help or advise there.
When I run the first block of code:
data inter (keep=id vn dollars); length vn $200; set current_state; array v{*} att_; do i=1 to dim(v); vn=v{i}; output; end; run;
I get a 3 row dataset with 3 columns (vn,id and Dollars). The vn field is not populated with att_1, att_2, etc it is just blank. Which then when you run the second block of code:
proc transpose data=inter out=want; by id; var dollars; id vn; idlabel vn; run;
You just get a dataset with 3 rows and 2 columns... Which is not the desired output I described above.
Typo, missing a semi-colon after att_:
data current_state; input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars; datalines; 1 d234 d463 d213 d678 d435 50 2 d213 d690 d360 d145 d269 25 3 d409 d231 d463 d690 d609 10 ; run; data inter (keep=id vn dollars); length vn $200; set current_state; array v{*} att_:; do i=1 to dim(v); vn=v{i}; output; end; run; proc transpose data=inter out=want; by id; var dollars; id vn; idlabel vn; run;
I would accomplish the task using two macros that I and others wrote and presented at SAS Global Forums, namely:
data current_state;
input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;
filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=current_state, out=long,
by=id, var=att, id=na, delimiter=_, copy=Dollars)
filename tr url 'http://tiny.cc/transpose_macro';
%include tr ;
%transpose(data=long, out=want, by=id, var=Dollars, id=att,
sort=yes, use_varname=no)
Art, CEO, AnalystFinder.com
One followup question to this one... How would I make your macro work if the data didn't have sequential column names? For example:
data current_state; input id $ att $ type $ def $ part $ section $ Dollars; datalines; 1 d234 d463 d213 d678 d435 50 2 d213 d690 d360 d145 d269 25 3 d409 d231 d463 d690 d609 10 ;
@A_SAS_Man: I would simply rename them within the data parameter and use the same code. e.g.:
data current_state;
input id $ att $ type $ def $ part $ section $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;
filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=current_state
(rename=(att=att1 type=att2 def=att3 part=att4 section=att5)),
out=long, by=id, var=att, id=na, copy=Dollars)
filename tr url 'http://tiny.cc/transpose_macro';
%include tr ;
%transpose(data=long, out=want, by=id, var=Dollars, id=att,
sort=yes, use_varname=no)
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.