Hi everyone,
I am trying to reshape my large datset with lots of missing data and variable names that are mixed character and numeric (none of that I want to drop). I have read about both the PROC TRANSPOSE and the ARRAY options to reshape from long to wide and both seem to include the need to write out all of the names of my hundreds of variables. I will need to reshape the data by two different variables ID and Sex in order to make the VISITDATE variable that is currently long, wide. The issue is that many of the VISITDATEs are a different date over a 10 year period in the format DDmonthYY (12march04). Because of this I am thinking I can somehow reshape and make new vars VISITDATE1-VISITDATE20 (the max in the dataset) but having a hard time wrapping my head around the actual array coding.
Can some brilliant programmer in dates and arrays help out?
Thank you so much in advance,
Megan
There is a macro that transposes several variables at once. I've never used it.
%MultiTranspose
http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/multitranspose.html
One: some example data and what the final output should be is helpful.
Two: What will you do next with the data? If you don't want to write variable names when creating them the issue will magnify when trying to manipulate them.
If this is a report then proc tabulate and proc report both will generate a column for each level of a variable defined to create groups (report) or class (tabulate).
If the variables are named "nicely" then variable lists do help. If you have a bunch of variables of the same typle that all start with DATE, then DATE: <= note the colon, will allow you to assign them all to an array easily, or if they have sequential names such as Var1 to Var23 then a range list like Var1 - Var23 does similar.
A little concerned about ID and SEX involved with setting a Date variable though.
HI there, thanks for replying. To answer your Qs:
1) Please note that the HTID is the SAME for 2 people (based on a couples study) so I am trying to make individual IDs and you have to do this by sorting by htid and sex. Also note there are multiple observtaion (rows) for each member of the couple (HTID) at different times. Also the variables are unfortuantely not named in a way where I can use an array var1-var25 so will likely need to rename them temporarily to get the data in shape.
Example data have:
HTID SEX AGE VISITDATE var4 var5 var6....var200
5014 M 35 10march2002
5014 M 35 02june2002
5014 M 35 18sept2002
5014 M 35 02jan2003
5014 F 30 10march2002
5014 F 30 26june2002
5023 M 28 20june2003
5023 M 28 21sept2003
5023 M 28 5jan2004
Example data want: each individual with their own row with each visitdate as an unique event
ID HTID SEX AGE VISITDATE1 VISITDATE2 VISITDATE3.....
1 5014 M 35 10march2002 02june2002 18sept2002
2 5014 F 30 10march2002 26june2002
...
2) I am happy to write variable names if that is the only way, I just wasn't sure. I thought maybe it would be easier to rename them to some convention like var1-var200 in order to get the data in shape first with arrays, etc.
Eventually, after descriptive stats, etc. I will be be perfoming some multivariate logistic regressions and build marginal structural models etc.
Please provide some sample data "want" and explain us what you need and why.
It's most of the time much better to have data in a long instead of a wide structure so I would like to understand why you want it the other way round.
Hi Patrick,
Please note that the HTID is the SAME for 2 people (based on a couples study) so I am trying to make individual IDs and you have to do this by sorting by htid and sex. Also note there are multiple observtaion (rows) for each member of the couple (HTID) at different times. Also the variables are unfortuantely not named in a way where I can use an array var1-var25 so will likely need to rename them temporarily to get the data in shape.
Example data have:
HTID SEX AGE VISITDATE var4 var5 var6....var200
5014 M 35 10march2002
5014 M 35 02june2002
5014 M 35 18sept2002
5014 M 35 02jan2003
5014 F 30 10march2002
5014 F 30 26june2002
5023 M 28 20june2003
5023 M 28 21sept2003
5023 M 28 5jan2004
Example data want: each individual with their own row with each visitdate as an unique event
ID HTID SEX AGE VISITDATE1 VISITDATE2 VISITDATE3.....
1 5014 M 35 10march2002 02june2002 18sept2002
2 5014 F 30 10march2002 26june2002
...
Eventually, after descriptive stats, etc. I will be be perfoming some multivariate logistic regressions and build marginal structural models etc.
You can use two variables to uniquely define your individuals. PROC MEANS; BY HTID SEX; for example.
Or you can make a new unique id variable if you want.
data want ;
set have;
by htid sex;
uid + first.sex ;
run;
Thank you. So that will just give me a unique ID. But what about then reshaping VISIDATE? Especially since I want to transpose multiple variables (var4_visit1, var4_visit2, etc. ) by visitdate? Any idea if that is possible? THanks again.
I wouldn't want to try to transpose more than 2 variables especially in a large dataset. Since you mentioned that you have missing data and mixed formated data, you might end up with a lot of problems if you try to reshape multiple variables.
There is a macro that transposes several variables at once. I've never used it.
%MultiTranspose
http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/multitranspose.html
Hi, using the data youi posted ...
data x;
input htid sex :$1. age visitdate :anydtdte.;
datalines;
5014 M 35 10MARCH2002
5014 M 35 02JUNE2002
5014 M 35 18SEPT2002
5014 M 35 02JAN2003
5014 F 30 10MARCH2002
5014 F 30 26JUNE2002
5023 M 28 20JUNE2003
5023 M 28 21SEPT2003
5023 M 28 5JAN2004
;
proc sql noprint;
select max(obs) into :nobs from (select count(*) as obs from x group by htid, sex, age);
quit;
proc summary nway data=x missing;
class htid sex age;
output out=x_wide(drop=_type_ _freq_) idgroup(out[&nobs] (visitdate)=);
format v: date9.;
run;
data set X_WIDE ...
visitdate_ visitdate_ visitdate_ visitdate_
Obs htid sex age 1 2 3 4
1 5014 F 30 10MAR2002 26JUN2002 . .
2 5014 M 35 10MAR2002 02JUN2002 18SEP2002 02JAN2003
3 5023 M 28 20JUN2003 21SEP2003 05JAN2004 .
Take a look at ...
Transposing Data Using PROC SUMMARY'S IDGROUP Option
http://support.sas.com/resources/papers/proceedings10/102-2010.pdf
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.