Hello,
I have a database where subjects have repeated measurements. These measurements are taken during specific time periods delineated by StartDate and EndDate1. Sample code below:
data tempfile;
infile datalines truncover;
INPUT
StudyID$2.@+1 count StartDate mmddyy10.@+1 EndDate1 mmddyy10.@+1;
format Startdate EndDate1 mmddyy10.;
datalines;
01 1 02/01/2007 05/01/2008
02 1 06/02/2012 05/24/2013
03 1 05/07/2010 08/19/2010
04 1 06/16/2007 06/21/2008
05 1 04/28/2007 11/07/2008
06 1 03/16/2007 06/20/2007
07 1 04/18/2007 04/19/2007
08 1 05/11/2007 08/03/2007
09 1 08/01/2007 08/28/2007
09 3 02/21/2008 03/05/2008
10 1 02/06/2008 03/12/2010
11 1 01/31/2007 12/26/2008
11 3 02/11/2009 05/08/2009
11 5 05/19/2009 08/14/2009
11 7 09/22/2009 12/18/2009
11 9 01/09/2010 02/03/2010
11 11 02/23/2010 11/19/2010
11 13 12/01/2010 03/18/2011
11 15 05/10/2011 04/23/2013
11 17 08/03/2013 09/27/2013
;
run;
proc print data=tempfile; run;
Please note that the count variable is left over from a previous data manipulation, which is why there are only odd values instead of sequential numbers (1,3,5,... instead of 1,2,3,...).
I want to further group these measurements into 12-month time periods that start on EndDate1; basically I am interested in following individuals for 12 months after EndDate1. The issue is that I am not sure how to write a code that will allow me to reorganize the data this way. I was thinking about trying to create something like this:
StudyID | count | StartDate | EndDate1 | Date_12mopost | Period |
01 | 1 | 02/01/2007 | 05/01/2008 | 2009-05-01 | 1 |
02 | 1 | 06/02/2012 | 05/24/2013 | 2014-05-24 | 1 |
03 | 1 | 05/07/2010 | 08/19/2010 | 2011-08-19 | 1 |
04 | 1 | 06/16/2007 | 06/21/2008 | 2009-06-21 | 1 |
05 | 1 | 04/28/2007 | 11/07/2008 | 2009-11-07 | 1 |
06 | 1 | 03/16/2007 | 06/20/2007 | 2008-06-19 | 1 |
07 | 1 | 04/18/2007 | 04/19/2007 | 2008-04-18 | 1 |
08 | 1 | 05/11/2007 | 08/03/2007 | 2008-08-02 | 1 |
09 | 1 | 08/01/2007 | 08/28/2007 | 2008-08-27 | 1 |
09 | 3 | 02/21/2008 | 03/05/2008 | 2008-08-27 | 1 |
10 | 1 | 02/06/2008 | 03/12/2010 | 2011-03-12 | 1 |
11 | 1 | 01/31/2007 | 12/26/2008 | 2009-12-26 | 1 |
11 | 3 | 02/11/2009 | 05/08/2009 | 2009-12-26 | 1 |
11 | 5 | 05/19/2009 | 08/14/2009 | 2009-12-26 | 1 |
11 | 7 | 09/22/2009 | 12/18/2009 | 2009-12-26 | 1 |
11 | 9 | 01/09/2010 | 02/03/2010 | 2011-02-03 | 2 (starting over) |
11 | 11 | 02/23/2010 | 11/19/2010 | 2011-02-03 | 2 |
11 | 13 | 12/01/2010 | 03/18/2011 | 2012-03-18 | 3 (starting over) |
11 | 15 | 05/10/2011 | 04/23/2013 | 2014-04-23 | 4 (starting over) |
11 | 17 | 08/03/2013 | 09/27/2013 | 2014-04-23 | 4 |
This will allow me to do what I need to do with the data to eventually be able to analyze it (for example, eventually transposing the data into the appropriate one-year time period, or selecting the first row of the one-year time period). So I guess my questions might be:
So far, I have only managed to create a semi-accurate date variable representing 12 months after EndDate1, using the following code:
data tempfile;
set tempfile;
by studyID count;
retain Date12mopost;
if first.studyID then do;
Date_12mopost=(EndDate1+365);
end;
else do;
Date_12mopost=Date_12mopost;
end;
format Date_12mopost yymmdd10. ;
run;
proc print data = tempfile; run;
StudyID | count | StartDate | EndDate1 | Date_12mopost |
01 | 1 | 02/01/2007 | 05/01/2008 | 2009-05-01 |
02 | 1 | 06/02/2012 | 05/24/2013 | 2014-05-24 |
03 | 1 | 05/07/2010 | 08/19/2010 | 2011-08-19 |
04 | 1 | 06/16/2007 | 06/21/2008 | 2009-06-21 |
05 | 1 | 04/28/2007 | 11/07/2008 | 2009-11-07 |
06 | 1 | 03/16/2007 | 06/20/2007 | 2008-06-19 |
07 | 1 | 04/18/2007 | 04/19/2007 | 2008-04-18 |
08 | 1 | 05/11/2007 | 08/03/2007 | 2008-08-02 |
09 | 1 | 08/01/2007 | 08/28/2007 | 2008-08-27 |
09 | 3 | 02/21/2008 | 03/05/2008 | 2009-03-05 |
10 | 1 | 02/06/2008 | 03/12/2010 | 2011-03-12 |
11 | 1 | 01/31/2007 | 12/26/2008 | 2009-12-26 |
11 | 3 | 02/11/2009 | 05/08/2009 | 2010-05-08 |
11 | 5 | 05/19/2009 | 08/14/2009 | 2010-08-14 |
11 | 7 | 09/22/2009 | 12/18/2009 | 2010-12-18 |
11 | 9 | 01/09/2010 | 02/03/2010 | 2011-02-03 |
11 | 11 | 02/23/2010 | 11/19/2010 | 2011-11-19 |
11 | 13 | 12/01/2010 | 03/18/2011 | 2012-03-17 |
11 | 15 | 05/10/2011 | 04/23/2013 | 2014-04-23 |
11 | 17 | 08/03/2013 | 09/27/2013 | 2014-09-27 |
The Date_12mopost is accurate for each row, but doesn’t help me try to group the periods into the a 12-month time period based on the dates from the preceding row.
I recognize this is a multi-step question that is likely extremely complicated. Any guidance, particularly regarding the specifics of the code, would be greatly appreciated.
Thank you very much in advance!
data want;
set tempfile;
by StudyID;
retain date_12mopost period;
if first.StudyID and last.studyid then do;
date_12mopost= intnx('year', enddate1, 1, 's');
period=1;
end;
else if first.StudyID and not last.studyid then do;
date_12mopost=intnx('year', enddate1, 1, 's');
period=1;
end;
else if not first.studyid then do;
if enddate1>date_12mopost then do;
date_12mopost=intnx('year', enddate1, 1, 's');
period+1;
end;
end;
format date_12mopost mmddyy10.;
run;
Regards,
Naveen Srinivasan
Believe it or not, you're not that far away from working code. First, review this section:
if first.studyID then do;
Date_12mopost=(EndDate1+365);
end;
You could consider that formula OK, or you could experiment with:
date_12mopost = intnx('month', EndDate1, 12, 'same');
Second, add to that section:
if first.studyID then do;
Date_12mopost=(EndDate1+365);
period=1;
end;
Both variables are going to be retained, so there is no need for this code (remove it):
else do;
Date_12mopost=Date_12mopost;
end;
Finally, to increment period (and retain it at the same time):
if StartDate > Date12mo_post then do;
period + 1; /* which will also retain PERIOD */
Date12mo_post = EndDate1 + 365; /* or switch formulas */
end;
data want;
set tempfile;
by StudyID;
retain date_12mopost period;
if first.StudyID and last.studyid then do;
date_12mopost= intnx('year', enddate1, 1, 's');
period=1;
end;
else if first.StudyID and not last.studyid then do;
date_12mopost=intnx('year', enddate1, 1, 's');
period=1;
end;
else if not first.studyid then do;
if enddate1>date_12mopost then do;
date_12mopost=intnx('year', enddate1, 1, 's');
period+1;
end;
end;
format date_12mopost mmddyy10.;
run;
Regards,
Naveen Srinivasan
I believe that this is what you are looking for, if I understand it correctly:
DATA test_data1;
informat startdate mmddyy10. enddate mmddyy10. ;
input studyid count startdate $ enddate $;
format startdate enddate mmddyy10.;
datalines;
1 1 2/1/2007 5/1/2008
2 1 6/2/2012 5/24/2013
3 1 5/7/2010 8/19/2010
4 1 6/16/2007 6/21/2008
5 1 4/28/2007 11/7/2008
6 1 3/16/2007 6/20/2007
7 1 4/18/2007 4/19/2007
8 1 5/11/2007 8/3/2007
9 1 8/1/2007 8/28/2007
9 3 2/21/2008 3/5/2008
10 1 2/6/2008 3/12/2010
11 1 1/31/2007 12/26/2008
11 3 2/11/2009 5/8/2009
11 5 5/19/2009 8/14/2009
11 7 9/22/2009 12/18/2009
11 9 1/9/2010 2/3/2010
11 11 2/23/2010 11/19/2010
11 13 12/1/2010 3/18/2011
11 15 5/10/2011 4/23/2013
11 17 8/3/2013 9/27/2013
;
run;
DATA TEST_DATA1;
SET TEST_DATA1;
BY STUDYID;
FORMAT NEWMONTH MMDDYY10.;
RETAIN NEWMONTH;
IF FIRST.STUDYID THEN NEWMONTH = ENDDATE+365;
IF STARTDATE > NEWMONTH THEN DO;
NEWMONTH = ENDDATE + 365;
END;
RUN;
A gentle warning: not all years have 365 days.
Patrick
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.