Hi. I have a dataset in a wide format where IDs have multiple observations. This was based on where an individual experienced an outcome, underwent demographic changes, group assignment, etc. where a new row was created. I have stripped the superfluous variables and created an abbreviated sample data below. This abbreviated wide data set has study_id, an exposure variable, a group variable, and an indicator of presence for all months in the sample (1=present and contributing person-time, 0=absent), as below:
data have;
input study_id $ exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
S001 exposed a 1 1 0 0 0 0
S001 exposed a 0 0 1 1 0 0
S001 exposed a 0 0 0 0 1 0
S002 exposed b 1 0 0 0 0 0
S002 exposed b 0 1 1 0 0 0
S002 unexposed c 0 0 0 1 1 1
S003 unexposed c 1 1 1 1 1 1
S004 unexposed c 1 1 1 0 0 0
S004 unexposed d 0 0 0 1 1 1
;
RUN;
To prepare for a subsequent analysis, I would like to sum person time by month, exposed status, and group, as follows. I would also like to add a formatted monyy. variable so I can see where in the year each month occurs, as this dataset spans several years.
data want;
input month monyy. month_num exp $ group $ ptime;
format month monyy.;
datalines;
JAN10 1 exp a 1
JAN10 1 exp b 1
JAN10 1 unexp c 2
JAN10 1 unexp d 0
FEB10 2 exp a 1
FEB10 2 exp b 1
FEB10 2 unexp c 2
FEB10 2 unexp d 0
MAR10 3 exp a 1
MAR10 3 exp b 1
MAR10 3 unexp c 2
MAR10 3 unexp d 0
APR10 4 exp a 1
APR10 4 exp b 0
APR10 4 unexp c 2
APR10 4 unexp d 1
MAY10 5 exp a 1
MAY10 5 exp b 0
MAY10 5 unexp c 2
MAY10 5 unexp d 1
JUN10 6 exp a 0
JUN10 6 exp b 0
JUN10 6 unexp c 2
JUN10 6 unexp d 1
;
run;
I understand this may not be the most optimally organized dataset as it stands, but this is what I am working with at present. I may ask an additional question to see if some of this wide dataset formatting could have been bypassed using other start/stop date variables in the dataset, but I will do that in another thread when I have some time.
In its current form, does anyone have any ideas on the most optimal way to get from the multiple wide month variables in "have" to summed person time by month/exp/group as depicted in "want"?
... View more