BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_SAS_Man
Pyrite | Level 9

I am trying to transpose data in SAS from a long format to a wide format. The problem I'm having is that I have multiple columns that I'm trying to transpose. I have a few example datasets below to demonstrate what I'm trying to do. The actual dataset I'm doing this on is going to be very large, I think one way to handle this could be to tranpose individual columns and then merge at the end, but the dataset I'm going to be doing this on is going to be significantly larger (tens of thousands of columns), so this will be pretty unfeasible.

 

Below is the data format I'm starting with:

data current_state;
input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;

 Here is my desired data format:

data desired_state;
input id $ d145 $ d213 $ d231 $ d234 $ d269 $ d360 $ d409 $ d435 $ d463 $ d609 $ d678 $ d690;
datalines;
1 0 50 0 50 0 0 0 50 0 0 50 0
2 25 25 0 0 25 25 0 0 0 0 0 25
3 0 0 10 0 0 0 10 0 10 10 0 10
;

Is there a concise way to do this that I'm missing?

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I would accomplish the task using two macros that I and others wrote and presented at SAS Global Forums, namely:

 

data current_state;
  input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
  datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;

filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=current_state, out=long,
 by=id, var=att, id=na, delimiter=_, copy=Dollars)

filename tr url 'http://tiny.cc/transpose_macro';
%include tr ;
%transpose(data=long, out=want, by=id, var=Dollars, id=att,
   sort=yes, use_varname=no)

Art, CEO, AnalystFinder.com

 

View solution in original post

13 REPLIES 13
ballardw
Super User

It might help to 1) provide code used to attempt the transpose and 2) describe exactly what is not working, better to provide some example data that demonstrates what does not work along with what the desired result is.

 

Also any process that involves creating "tens of thousands of columns" is sort of suspect. What are you going to do with that wide data set?

A_SAS_Man
Pyrite | Level 9

So I do have what the desired output would be in the question, I have tried multiple solutions but here is one:

 

proc transpose data=current_state out=test1;
by id;
id att_1 att_2 att_3 att_4 att_5;
var Dollars;
run;

Essentially the part I'm having trouble with is getting unique column values and distributing the dollar amounts to the correct columns since the values for att_1-att_5 can be the same.

 

As far as what I'm doing with it, I'm going to be trying to do some nearest neighbor analysis.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I will start by saying its really not a good idea.  You are multiplying up the data (the 50 for instance appears 4 times rather than once), so your data will be larger = harder to programm with and take more resources.

 

As for how you do it, well its going to be a bit more complicated than a proc transpose, you first need to get the data into a method for transposing.   So from this:

1 d234 d463 d213 d678 d435 50

 To:

1 d234 50 
1 d463 50
1 d213 50
1 d678 50
1 d435 50

You can do this in a datastep:

data inter (keep=id vn dollars);
  length vn $200;
  set current_state;
  array v{*} att_;
  do i=1 to dim(v);
    vn=v{i};
    output;
  end;
run;

proc transpose data=inter out=want;
  by id;
  var dollars;
  id vn;
  idlabel vn;
run;

Note you would then need to add zeros for the missings etc.  So a whole chunk of work to get data which is far bigger in size and less than optimal to work with. 

A_SAS_Man
Pyrite | Level 9

Thank you! My next task is to figure out how to save it as a sparse matrix so it isn't quite so memory intensive. If you have any suggestions for me there I'd be happy to hear them... Or I might end up posting another question later on if I can't figure it out.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

As I probably mentioned above, I wouldn't.  I would keep it in long format.  Generally most procedures are fine with long data, so matrices can be avoid.

A_SAS_Man
Pyrite | Level 9

Are you familiar with KNN and related analyses? I'm unaware how to do those types of operations on a long formatted data set but I'm all ears if SAS has that functionality.

A_SAS_Man
Pyrite | Level 9

So I apologize but I just realized I'm having a slight issue with the code you provided... The first step only generates a dataset  three rows long, and it has none of the attributes listed out in the vn column. Am I missing something? Does it generate a long dataset for you?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, you have lost me.  The step I provided takes the wide data, into a long format, so vn takes the variable name into a column.  This is then used in the transpose back up again.  What attributes do you mean, label?  If so you need to extract that also simply grab the label with vlabel() function before the point of output, and save that to a variable as well.  Then when transposing you can use idlabel <variable you created with label in>;

No I do not know "KNN and related analyses", and I am aware that some certain analysis do require data in a set format, so can't really help or advise there.

A_SAS_Man
Pyrite | Level 9

When I run the first block of code:

 

data inter (keep=id vn dollars);
  length vn $200;
  set current_state;
  array v{*} att_;
  do i=1 to dim(v);
    vn=v{i};
    output;
  end;
run;

I get a 3 row dataset with 3 columns (vn,id and Dollars). The vn field is not populated with att_1, att_2, etc it is just blank. Which then when you run the second block of code:

proc transpose data=inter out=want;
  by id;
  var dollars;
  id vn;
  idlabel vn;
run;

You just get a dataset with 3 rows and 2 columns... Which is not the desired output I described above.

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Typo, missing a semi-colon after att_:

data current_state;
  input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;
run;
data inter (keep=id vn dollars);
  length vn $200;
  set current_state;
  array v{*} att_:;
  do i=1 to dim(v);
    vn=v{i};
    output;
  end;
run;

proc transpose data=inter out=want;
  by id;
  var dollars;
  id vn;
  idlabel vn;
run;
art297
Opal | Level 21

I would accomplish the task using two macros that I and others wrote and presented at SAS Global Forums, namely:

 

data current_state;
  input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
  datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;

filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=current_state, out=long,
 by=id, var=att, id=na, delimiter=_, copy=Dollars)

filename tr url 'http://tiny.cc/transpose_macro';
%include tr ;
%transpose(data=long, out=want, by=id, var=Dollars, id=att,
   sort=yes, use_varname=no)

Art, CEO, AnalystFinder.com

 

A_SAS_Man
Pyrite | Level 9

One followup question to this one... How would I make your macro work if the data didn't have sequential column names? For example:

 

data current_state;
input id $ att $ type $ def $ part $ section $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;

 

art297
Opal | Level 21

@A_SAS_Man: I would simply rename them within the data parameter and use the same code. e.g.:

data current_state;
  input id $ att $ type $ def $ part $ section $ Dollars;
  datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;

filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=current_state
  (rename=(att=att1 type=att2 def=att3 part=att4 section=att5)),
   out=long, by=id, var=att, id=na, copy=Dollars)

filename tr url 'http://tiny.cc/transpose_macro';
%include tr ;
%transpose(data=long, out=want, by=id, var=Dollars, id=att,
   sort=yes, use_varname=no)

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1896 views
  • 4 likes
  • 4 in conversation