BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mcdj
Obsidian | Level 7

Hi everyone,

 

I am trying to reshape my large datset with lots of missing data and variable names that are mixed character and numeric (none of that I want to drop). I have read about both the PROC TRANSPOSE and the ARRAY options to reshape from long to wide and both seem to include the need to write out all of the names of my hundreds of variables. I will need to reshape the data by two different variables ID and Sex in order to make the VISITDATE variable that is currently long, wide. The issue is that many of the VISITDATEs are a different date over a 10 year period in the format DDmonthYY (12march04).  Because of this I am thinking I can somehow reshape and make new vars VISITDATE1-VISITDATE20 (the max in the dataset) but having a hard time wrapping my head around the actual array coding.

 

Can some brilliant programmer in dates and arrays help out?

Thank you so much in advance,

Megan

1 ACCEPTED SOLUTION

Accepted Solutions
slangan
Obsidian | Level 7

There is a macro that transposes several variables at once. I've never used it.

 

 %MultiTranspose

 

http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/multitranspose.html

View solution in original post

10 REPLIES 10
ballardw
Super User

One: some example data and what the final output should be is helpful.

Two: What will you do next with the data? If you don't want to write variable names when creating them the issue will magnify when trying to manipulate them.

 

If this is a report then proc tabulate and proc report both will generate a column for each level of a variable defined to create groups (report) or class (tabulate).

 

If the variables are named "nicely" then variable lists do help. If you have a bunch of variables of the same typle that all start with DATE, then DATE: <= note the colon, will allow you to assign them all to an array easily, or if they have sequential names such as Var1 to Var23 then a range list like Var1 - Var23 does similar.

 

A little concerned about ID and SEX involved with setting a Date variable though.

mcdj
Obsidian | Level 7

HI there, thanks for replying. To answer your Qs:

 

1) Please note that the HTID is the SAME for 2 people (based on a couples study) so I am trying to make individual IDs and you have to do this by sorting by htid and sex. Also note there are multiple observtaion (rows) for each member of the couple (HTID) at different times. Also the variables are unfortuantely not named in a way where I can use an array var1-var25 so will likely need to rename them temporarily to get the data in shape.

 

Example data have:

HTID SEX AGE VISITDATE       var4   var5    var6....var200

5014 M     35      10march2002

5014 M     35       02june2002

5014 M     35       18sept2002

5014 M     35       02jan2003

5014 F     30       10march2002

5014 F     30       26june2002

5023 M    28       20june2003

5023 M    28       21sept2003

5023 M    28       5jan2004

 

Example data want: each individual with their own row with each visitdate as an unique event

 

ID  HTID  SEX AGE  VISITDATE1    VISITDATE2    VISITDATE3.....

1   5014  M     35      10march2002  02june2002     18sept2002

 

2   5014  F      30      10march2002   26june2002

...

 

 

2) I am happy to write variable names if that is the only way, I just wasn't sure. I thought maybe it would be easier to rename them to some convention like var1-var200 in order to get the data in shape first with arrays, etc.

 

Eventually, after descriptive stats, etc. I will be be perfoming some multivariate logistic regressions and build marginal structural models etc.

Patrick
Opal | Level 21

Please provide some sample data "want" and explain us what you need and why.

 

It's most of the time much better to have data in a long instead of a wide structure so I would like to understand why you want it the other way round.

mcdj
Obsidian | Level 7

Hi Patrick, 

 

Please note that the HTID is the SAME for 2 people (based on a couples study) so I am trying to make individual IDs and you have to do this by sorting by htid and sex. Also note there are multiple observtaion (rows) for each member of the couple (HTID) at different times. Also the variables are unfortuantely not named in a way where I can use an array var1-var25 so will likely need to rename them temporarily to get the data in shape.

 

Example data have:

HTID SEX AGE VISITDATE       var4   var5    var6....var200

5014 M     35      10march2002

5014 M     35       02june2002

5014 M     35       18sept2002

5014 M     35       02jan2003

5014 F     30       10march2002

5014 F     30       26june2002

5023 M    28       20june2003

5023 M    28       21sept2003

5023 M    28       5jan2004

 

Example data want: each individual with their own row with each visitdate as an unique event

 

ID  HTID  SEX AGE  VISITDATE1    VISITDATE2    VISITDATE3.....

1   5014  M     35      10march2002  02june2002     18sept2002

2   5014  F      30      10march2002   26june2002

...

 

 

Eventually, after descriptive stats, etc. I will be be perfoming some multivariate logistic regressions and build marginal structural models etc.

Tom
Super User Tom
Super User

You can use two variables to uniquely define your individuals.  PROC MEANS; BY HTID SEX;  for example.

Or you can make a new unique id variable if you want.

data want ;

  set have;

  by htid sex;

  uid + first.sex ;

run;

 

mcdj
Obsidian | Level 7

Thank you. So that will just give me a unique ID. But what about then reshaping VISIDATE? Especially since I want to transpose multiple variables (var4_visit1, var4_visit2, etc. ) by visitdate? Any idea if that is possible? THanks again.

slangan
Obsidian | Level 7

I wouldn't want to try to transpose more than 2 variables especially in a large dataset. Since you mentioned that you have missing data and mixed formated data, you might end up with a lot of problems if you try to reshape multiple variables.

slangan
Obsidian | Level 7

There is a macro that transposes several variables at once. I've never used it.

 

 %MultiTranspose

 

http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/multitranspose.html

MikeZdeb
Rhodochrosite | Level 12

Hi, using the data youi posted ...

 

data x;
input htid sex :$1. age visitdate :anydtdte.;
datalines;
5014 M 35 10MARCH2002
5014 M 35  02JUNE2002
5014 M 35  18SEPT2002
5014 M 35  02JAN2003
5014 F 30  10MARCH2002
5014 F 30  26JUNE2002
5023 M 28  20JUNE2003
5023 M 28  21SEPT2003
5023 M 28  5JAN2004
;


proc sql noprint;
select max(obs) into :nobs from (select count(*) as obs from x group by htid, sex, age);
quit;

 

proc summary nway data=x missing;
class htid sex age;
output out=x_wide(drop=_type_ _freq_) idgroup(out[&nobs] (visitdate)=);
format v: date9.;
run;

 

data set X_WIDE ...

                             visitdate_    visitdate_    visitdate_    visitdate_
Obs    htid    sex    age        1             2             3             4

 1     5014     F      30    10MAR2002     26JUN2002             .             .
 2     5014     M      35    10MAR2002     02JUN2002     18SEP2002     02JAN2003
 3     5023     M      28    20JUN2003     21SEP2003     05JAN2004             .

 

Take a look at ...

 

Transposing Data Using PROC SUMMARY'S IDGROUP Option

http://support.sas.com/resources/papers/proceedings10/102-2010.pdf

mcdj
Obsidian | Level 7
thanks, this worked too! I ended up using the Macro in the end but just checked it against your code which was great. Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 6898 views
  • 5 likes
  • 6 in conversation