DATA Step, Macro, Functions and more

long to wide

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

long to wide

[ Edited ]

Hello

 

i have a dataset with 100k rows and 20 variables

Each row represents a hospital visit by a persons - but i would like to combine the visits onto 1 row. i have been looking at proc transpose and proc array but i am not getting what i would like.

has anyone come across this kind of data manipulation before - any advice would be great

thanks

Lorraine


Accepted Solutions
Solution
‎08-16-2017 03:54 AM
Super User
Super User
Posts: 7,942

Re: long to wide

Posted in reply to Lorraine22

So you have multiple variables to transpose up, so proc transpose would be cumbersome, hence we go into arrays.

Will there only be a maximum of 2 rows per identifier?  If so then:

Data have ;
input identifier house area $ visit $ Treatment $;
cards;
66 33 dublin Jan yes
1300 22 cork March yes
1300 22 cork May yes
2246 4 galway July yes
2246 4 galway October yes
1449 66 dublin November no
;
run;

proc sort data=have;
  by identifier;
run;

data want (drop=element h a v t);
  set have (rename=(house=h area=a visit=v treatment=t));
  by identifier;
  array house{2} 8.;
  array area{2} $20;
  array visit{2} $20;
  array treatment{2} $20;
  retain element house: area: visit: treatment:;
  if first.identifier then do;
    call missing(of house:,of area:,of visit:,of treatment:);
    element=1;
  end;
  else element=element+1;
  house{element}=h;
  area{element}=a;
  visit{element}=v;
  treatment{element}=t;
  if last.identifier then output;
run;

If there are more than two, just increase 2 to whatever you need, or if you want extra glory, precalculate how many you need and drop that in a macro variable and use that.

 

View solution in original post


All Replies
Super Contributor
Posts: 439

Re: long to wide

Posted in reply to Lorraine22

Generally speaking it's much easier to program with a long data set than a wide one but if you must do it then Proc Transpose is the best way. Can you post a data step with some sample data (not an excel file) and let us know how you want it transposed.

Occasional Contributor
Posts: 9

Re: long to wide

Posted in reply to ChrisBrooks


Data have ;

input identifier house area $ visit $ Treatment $;

cards;
66 33 dublin Jan yes
1300 22 cork March yes
1300 22 cork May yes
2246 4 galway July yes
2246 4 galway October yes
1449 66 dublin November no

;
run;

 

data want         
identifierhouseareavisitTreatmentidentifier2house2area2visitTreatment2
6633dublinJanuaryyes     
130022corkMarchyes130022corkMayyes
22464galwayJulyyes22464galwayOctoberyes
144966dublinNovemberno     

 

thanks for your replies

Solution
‎08-16-2017 03:54 AM
Super User
Super User
Posts: 7,942

Re: long to wide

Posted in reply to Lorraine22

So you have multiple variables to transpose up, so proc transpose would be cumbersome, hence we go into arrays.

Will there only be a maximum of 2 rows per identifier?  If so then:

Data have ;
input identifier house area $ visit $ Treatment $;
cards;
66 33 dublin Jan yes
1300 22 cork March yes
1300 22 cork May yes
2246 4 galway July yes
2246 4 galway October yes
1449 66 dublin November no
;
run;

proc sort data=have;
  by identifier;
run;

data want (drop=element h a v t);
  set have (rename=(house=h area=a visit=v treatment=t));
  by identifier;
  array house{2} 8.;
  array area{2} $20;
  array visit{2} $20;
  array treatment{2} $20;
  retain element house: area: visit: treatment:;
  if first.identifier then do;
    call missing(of house:,of area:,of visit:,of treatment:);
    element=1;
  end;
  else element=element+1;
  house{element}=h;
  area{element}=a;
  visit{element}=v;
  treatment{element}=t;
  if last.identifier then output;
run;

If there are more than two, just increase 2 to whatever you need, or if you want extra glory, precalculate how many you need and drop that in a macro variable and use that.

 

Occasional Contributor
Posts: 9

Re: long to wide

thanks so much for this - i will give it a go.

the number of visits will vary per person from 1 up to maybe 10

 

thanks again

Lorraine

Super User
Super User
Posts: 7,942

Re: long to wide

Posted in reply to Lorraine22

Please post test data in the form of a datastep, Excel neither conveys structure, nor is it safe for users to download.  You can use this post;

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

To get a datastep version of your test data, or just type it in quickly.

 

Also a good idea to post what the output should look like.  Proc transpose should be able to handle any normal to transposed assuming only one variable, if more than one variable then arrays is the way to go.

Occasional Contributor
Posts: 9

Re: long to wide

i've some across %multitranspose - which uses macros - anyone familiar with this? macros are not my best friend!

 

http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/multitranspose.html#sascode

Super User
Posts: 11,343

Re: long to wide

Posted in reply to Lorraine22

Please describe what you are going to do with that data after you have made it wide.

 

Since you are going to have different numbers of variables for each patient you'll likely need to adjust logic for the differing number of variables adding much difficulty to any analysis logic.

 

And if this is for a report it may be easier to use the data directly in a report that will handle repeated values as a column header such as adding a visit number per patient and then using that as a column group variable.

Occasional Contributor
Posts: 9

Re: long to wide

thansk for the reply - i've been on holidays so only getting back to this now

 

i am essentially converting hospital data - which is currently structured by visit - and i want to structure it by patient, so that all the visit information is on the one row. then i will carry out analysis on the patient in SAS

Super User
Super User
Posts: 7,942

Re: long to wide

Posted in reply to Lorraine22

Its unlikely that having visit data as columns would make your analysis any easier.  The CDISC models, pretty standard for clinical data are normalised as this simplifies selection and analysis data.  For instance, if you had:

USUBJID VISIT1 VISIT2 VISIT3 ...

 

Then your programming would need to know the number of visits, an early program may have 3, but full data may have more, therefor you have to program for all these eventualities.  The same thing as

USUBJID  VISIT

...

Uses just those two variables, and then runs on the data that can be as long or short as needed, and with by group processing, you can do analysis - such as means - per visit as:

proc means data=...;
  by visit;
...
run;

Which will produce results per visit.

Super Contributor
Posts: 439

Re: long to wide

There are also going to be a lot of empty cells if most people only have 2 or 3 visits with a tiny number having the full 10. This means a much bigger file with a lot of wasted disk space and longer processing times. In short while I agree "wide" is best for Excel "long" is better for almost any other environment including SAS.

Occasional Contributor
Posts: 9

Re: long to wide

Posted in reply to ChrisBrooks

i agree Chris - higher powers want it set up like this though!!

Occasional Contributor
Posts: 9

Re: long to wide

Posted in reply to Lorraine22

thanks all for your help on this - the SAS skills are a bit rusty so looking forward to increase my learning Smiley Happy

Super Contributor
Posts: 439

Re: long to wide

Posted in reply to Lorraine22

I feel your pain - I've been there also :-)

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 295 views
  • 4 likes
  • 4 in conversation