Help using Base SAS procedures

Reshaping a large data set from long to wide

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

Reshaping a large data set from long to wide

Hi everyone,

 

I am trying to reshape my large datset with lots of missing data and variable names that are mixed character and numeric (none of that I want to drop). I have read about both the PROC TRANSPOSE and the ARRAY options to reshape from long to wide and both seem to include the need to write out all of the names of my hundreds of variables. I will need to reshape the data by two different variables ID and Sex in order to make the VISITDATE variable that is currently long, wide. The issue is that many of the VISITDATEs are a different date over a 10 year period in the format DDmonthYY (12march04).  Because of this I am thinking I can somehow reshape and make new vars VISITDATE1-VISITDATE20 (the max in the dataset) but having a hard time wrapping my head around the actual array coding.

 

Can some brilliant programmer in dates and arrays help out?

Thank you so much in advance,

Megan


Accepted Solutions
Solution
‎10-23-2015 05:53 PM
Contributor
Posts: 20

Re: Reshaping a large data set from long to wide

There is a macro that transposes several variables at once. I've never used it.

 

 %MultiTranspose

 

http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/multitranspose.html

View solution in original post


All Replies
Super User
Posts: 10,497

Re: Reshaping a large data set from long to wide

One: some example data and what the final output should be is helpful.

Two: What will you do next with the data? If you don't want to write variable names when creating them the issue will magnify when trying to manipulate them.

 

If this is a report then proc tabulate and proc report both will generate a column for each level of a variable defined to create groups (report) or class (tabulate).

 

If the variables are named "nicely" then variable lists do help. If you have a bunch of variables of the same typle that all start with DATE, then DATE: <= note the colon, will allow you to assign them all to an array easily, or if they have sequential names such as Var1 to Var23 then a range list like Var1 - Var23 does similar.

 

A little concerned about ID and SEX involved with setting a Date variable though.

Occasional Contributor
Posts: 17

Re: Reshaping a large data set from long to wide

HI there, thanks for replying. To answer your Qs:

 

1) Please note that the HTID is the SAME for 2 people (based on a couples study) so I am trying to make individual IDs and you have to do this by sorting by htid and sex. Also note there are multiple observtaion (rows) for each member of the couple (HTID) at different times. Also the variables are unfortuantely not named in a way where I can use an array var1-var25 so will likely need to rename them temporarily to get the data in shape.

 

Example data have:

HTID SEX AGE VISITDATE       var4   var5    var6....var200

5014 M     35      10march2002

5014 M     35       02june2002

5014 M     35       18sept2002

5014 M     35       02jan2003

5014 F     30       10march2002

5014 F     30       26june2002

5023 M    28       20june2003

5023 M    28       21sept2003

5023 M    28       5jan2004

 

Example data want: each individual with their own row with each visitdate as an unique event

 

ID  HTID  SEX AGE  VISITDATE1    VISITDATE2    VISITDATE3.....

1   5014  M     35      10march2002  02june2002     18sept2002

 

2   5014  F      30      10march2002   26june2002

...

 

 

2) I am happy to write variable names if that is the only way, I just wasn't sure. I thought maybe it would be easier to rename them to some convention like var1-var200 in order to get the data in shape first with arrays, etc.

 

Eventually, after descriptive stats, etc. I will be be perfoming some multivariate logistic regressions and build marginal structural models etc.

Respected Advisor
Posts: 3,889

Re: Reshaping a large data set from long to wide

[ Edited ]

Please provide some sample data "want" and explain us what you need and why.

 

It's most of the time much better to have data in a long instead of a wide structure so I would like to understand why you want it the other way round.

Occasional Contributor
Posts: 17

Re: Reshaping a large data set from long to wide

Hi Patrick, 

 

Please note that the HTID is the SAME for 2 people (based on a couples study) so I am trying to make individual IDs and you have to do this by sorting by htid and sex. Also note there are multiple observtaion (rows) for each member of the couple (HTID) at different times. Also the variables are unfortuantely not named in a way where I can use an array var1-var25 so will likely need to rename them temporarily to get the data in shape.

 

Example data have:

HTID SEX AGE VISITDATE       var4   var5    var6....var200

5014 M     35      10march2002

5014 M     35       02june2002

5014 M     35       18sept2002

5014 M     35       02jan2003

5014 F     30       10march2002

5014 F     30       26june2002

5023 M    28       20june2003

5023 M    28       21sept2003

5023 M    28       5jan2004

 

Example data want: each individual with their own row with each visitdate as an unique event

 

ID  HTID  SEX AGE  VISITDATE1    VISITDATE2    VISITDATE3.....

1   5014  M     35      10march2002  02june2002     18sept2002

2   5014  F      30      10march2002   26june2002

...

 

 

Eventually, after descriptive stats, etc. I will be be perfoming some multivariate logistic regressions and build marginal structural models etc.

Super User
Super User
Posts: 6,499

Re: Reshaping a large data set from long to wide

You can use two variables to uniquely define your individuals.  PROC MEANS; BY HTID SEX;  for example.

Or you can make a new unique id variable if you want.

data want ;

  set have;

  by htid sex;

  uid + first.sex ;

run;

 

Occasional Contributor
Posts: 17

Re: Reshaping a large data set from long to wide

Thank you. So that will just give me a unique ID. But what about then reshaping VISIDATE? Especially since I want to transpose multiple variables (var4_visit1, var4_visit2, etc. ) by visitdate? Any idea if that is possible? THanks again.

Contributor
Posts: 20

Re: Reshaping a large data set from long to wide

[ Edited ]

I wouldn't want to try to transpose more than 2 variables especially in a large dataset. Since you mentioned that you have missing data and mixed formated data, you might end up with a lot of problems if you try to reshape multiple variables.

Solution
‎10-23-2015 05:53 PM
Contributor
Posts: 20

Re: Reshaping a large data set from long to wide

There is a macro that transposes several variables at once. I've never used it.

 

 %MultiTranspose

 

http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/multitranspose.html

Valued Guide
Posts: 765

Re: Reshaping a large data set from long to wide

[ Edited ]

Hi, using the data youi posted ...

 

data x;
input htid sex :$1. age visitdate :anydtdte.;
datalines;
5014 M 35 10MARCH2002
5014 M 35  02JUNE2002
5014 M 35  18SEPT2002
5014 M 35  02JAN2003
5014 F 30  10MARCH2002
5014 F 30  26JUNE2002
5023 M 28  20JUNE2003
5023 M 28  21SEPT2003
5023 M 28  5JAN2004
;


proc sql noprint;
select max(obs) into :nobs from (select count(*) as obs from x group by htid, sex, age);
quit;

 

proc summary nway data=x missing;
class htid sex age;
output out=x_wide(drop=_type_ _freq_) idgroup(out[&nobs] (visitdate)=);
format v: date9.;
run;

 

data set X_WIDE ...

                             visitdate_    visitdate_    visitdate_    visitdate_
Obs    htid    sex    age        1             2             3             4

 1     5014     F      30    10MAR2002     26JUN2002             .             .
 2     5014     M      35    10MAR2002     02JUN2002     18SEP2002     02JAN2003
 3     5023     M      28    20JUN2003     21SEP2003     05JAN2004             .

 

Take a look at ...

 

Transposing Data Using PROC SUMMARY'S IDGROUP Option

http://support.sas.com/resources/papers/proceedings10/102-2010.pdf

Occasional Contributor
Posts: 17

Re: Reshaping a large data set from long to wide

thanks, this worked too! I ended up using the Macro in the end but just checked it against your code which was great. Thanks!
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 1914 views
  • 5 likes
  • 6 in conversation