Hello,
I have a database where subjects have repeated measurements. These measurements are taken during specific time periods delineated by StartDate and EndDate1. Sample code below:
data tempfile;
infile datalines truncover;
INPUT
StudyID$2.@+1 count StartDate mmddyy10.@+1 EndDate1 mmddyy10.@+1;
format Startdate EndDate1 mmddyy10.;
datalines;
01 1 02/01/2007 05/01/2008
02 1 06/02/2012 05/24/2013
03 1 05/07/2010 08/19/2010
04 1 06/16/2007 06/21/2008
05 1 04/28/2007 11/07/2008
06 1 03/16/2007 06/20/2007
07 1 04/18/2007 04/19/2007
08 1 05/11/2007 08/03/2007
09 1 08/01/2007 08/28/2007
09 3 02/21/2008 03/05/2008
10 1 02/06/2008 03/12/2010
11 1 01/31/2007 12/26/2008
11 3 02/11/2009 05/08/2009
11 5 05/19/2009 08/14/2009
11 7 09/22/2009 12/18/2009
11 9 01/09/2010 02/03/2010
11 11 02/23/2010 11/19/2010
11 13 12/01/2010 03/18/2011
11 15 05/10/2011 04/23/2013
11 17 08/03/2013 09/27/2013
;
run;
proc print data=tempfile; run;
Please note that the count variable is left over from a previous data manipulation, which is why there are only odd values instead of sequential numbers (1,3,5,... instead of 1,2,3,...).
I want to further group these measurements into 12-month time periods that start on EndDate1; basically I am interested in following individuals for 12 months after EndDate1. The issue is that I am not sure how to write a code that will allow me to reorganize the data this way. I was thinking about trying to create something like this:
StudyID | count | StartDate | EndDate1 | Date_12mopost | Period |
01 | 1 | 02/01/2007 | 05/01/2008 | 2009-05-01 | 1 |
02 | 1 | 06/02/2012 | 05/24/2013 | 2014-05-24 | 1 |
03 | 1 | 05/07/2010 | 08/19/2010 | 2011-08-19 | 1 |
04 | 1 | 06/16/2007 | 06/21/2008 | 2009-06-21 | 1 |
05 | 1 | 04/28/2007 | 11/07/2008 | 2009-11-07 | 1 |
06 | 1 | 03/16/2007 | 06/20/2007 | 2008-06-19 | 1 |
07 | 1 | 04/18/2007 | 04/19/2007 | 2008-04-18 | 1 |
08 | 1 | 05/11/2007 | 08/03/2007 | 2008-08-02 | 1 |
09 | 1 | 08/01/2007 | 08/28/2007 | 2008-08-27 | 1 |
09 | 3 | 02/21/2008 | 03/05/2008 | 2008-08-27 | 1 |
10 | 1 | 02/06/2008 | 03/12/2010 | 2011-03-12 | 1 |
11 | 1 | 01/31/2007 | 12/26/2008 | 2009-12-26 | 1 |
11 | 3 | 02/11/2009 | 05/08/2009 | 2009-12-26 | 1 |
11 | 5 | 05/19/2009 | 08/14/2009 | 2009-12-26 | 1 |
11 | 7 | 09/22/2009 | 12/18/2009 | 2009-12-26 | 1 |
11 | 9 | 01/09/2010 | 02/03/2010 | 2011-02-03 | 2 (starting over) |
11 | 11 | 02/23/2010 | 11/19/2010 | 2011-02-03 | 2 |
11 | 13 | 12/01/2010 | 03/18/2011 | 2012-03-18 | 3 (starting over) |
11 | 15 | 05/10/2011 | 04/23/2013 | 2014-04-23 | 4 (starting over) |
11 | 17 | 08/03/2013 | 09/27/2013 | 2014-04-23 | 4 |
This will allow me to do what I need to do with the data to eventually be able to analyze it (for example, eventually transposing the data into the appropriate one-year time period, or selecting the first row of the one-year time period). So I guess my questions might be:
So far, I have only managed to create a semi-accurate date variable representing 12 months after EndDate1, using the following code:
data tempfile;
set tempfile;
by studyID count;
retain Date12mopost;
if first.studyID then do;
Date_12mopost=(EndDate1+365);
end;
else do;
Date_12mopost=Date_12mopost;
end;
format Date_12mopost yymmdd10. ;
run;
proc print data = tempfile; run;
StudyID | count | StartDate | EndDate1 | Date_12mopost |
01 | 1 | 02/01/2007 | 05/01/2008 | 2009-05-01 |
02 | 1 | 06/02/2012 | 05/24/2013 | 2014-05-24 |
03 | 1 | 05/07/2010 | 08/19/2010 | 2011-08-19 |
04 | 1 | 06/16/2007 | 06/21/2008 | 2009-06-21 |
05 | 1 | 04/28/2007 | 11/07/2008 | 2009-11-07 |
06 | 1 | 03/16/2007 | 06/20/2007 | 2008-06-19 |
07 | 1 | 04/18/2007 | 04/19/2007 | 2008-04-18 |
08 | 1 | 05/11/2007 | 08/03/2007 | 2008-08-02 |
09 | 1 | 08/01/2007 | 08/28/2007 | 2008-08-27 |
09 | 3 | 02/21/2008 | 03/05/2008 | 2009-03-05 |
10 | 1 | 02/06/2008 | 03/12/2010 | 2011-03-12 |
11 | 1 | 01/31/2007 | 12/26/2008 | 2009-12-26 |
11 | 3 | 02/11/2009 | 05/08/2009 | 2010-05-08 |
11 | 5 | 05/19/2009 | 08/14/2009 | 2010-08-14 |
11 | 7 | 09/22/2009 | 12/18/2009 | 2010-12-18 |
11 | 9 | 01/09/2010 | 02/03/2010 | 2011-02-03 |
11 | 11 | 02/23/2010 | 11/19/2010 | 2011-11-19 |
11 | 13 | 12/01/2010 | 03/18/2011 | 2012-03-17 |
11 | 15 | 05/10/2011 | 04/23/2013 | 2014-04-23 |
11 | 17 | 08/03/2013 | 09/27/2013 | 2014-09-27 |
The Date_12mopost is accurate for each row, but doesn’t help me try to group the periods into the a 12-month time period based on the dates from the preceding row.
I recognize this is a multi-step question that is likely extremely complicated. Any guidance, particularly regarding the specifics of the code, would be greatly appreciated.
Thank you very much in advance!
data want;
set tempfile;
by StudyID;
retain date_12mopost period;
if first.StudyID and last.studyid then do;
date_12mopost= intnx('year', enddate1, 1, 's');
period=1;
end;
else if first.StudyID and not last.studyid then do;
date_12mopost=intnx('year', enddate1, 1, 's');
period=1;
end;
else if not first.studyid then do;
if enddate1>date_12mopost then do;
date_12mopost=intnx('year', enddate1, 1, 's');
period+1;
end;
end;
format date_12mopost mmddyy10.;
run;
Regards,
Naveen Srinivasan
Believe it or not, you're not that far away from working code. First, review this section:
if first.studyID then do;
Date_12mopost=(EndDate1+365);
end;
You could consider that formula OK, or you could experiment with:
date_12mopost = intnx('month', EndDate1, 12, 'same');
Second, add to that section:
if first.studyID then do;
Date_12mopost=(EndDate1+365);
period=1;
end;
Both variables are going to be retained, so there is no need for this code (remove it):
else do;
Date_12mopost=Date_12mopost;
end;
Finally, to increment period (and retain it at the same time):
if StartDate > Date12mo_post then do;
period + 1; /* which will also retain PERIOD */
Date12mo_post = EndDate1 + 365; /* or switch formulas */
end;
data want;
set tempfile;
by StudyID;
retain date_12mopost period;
if first.StudyID and last.studyid then do;
date_12mopost= intnx('year', enddate1, 1, 's');
period=1;
end;
else if first.StudyID and not last.studyid then do;
date_12mopost=intnx('year', enddate1, 1, 's');
period=1;
end;
else if not first.studyid then do;
if enddate1>date_12mopost then do;
date_12mopost=intnx('year', enddate1, 1, 's');
period+1;
end;
end;
format date_12mopost mmddyy10.;
run;
Regards,
Naveen Srinivasan
I believe that this is what you are looking for, if I understand it correctly:
DATA test_data1;
informat startdate mmddyy10. enddate mmddyy10. ;
input studyid count startdate $ enddate $;
format startdate enddate mmddyy10.;
datalines;
1 1 2/1/2007 5/1/2008
2 1 6/2/2012 5/24/2013
3 1 5/7/2010 8/19/2010
4 1 6/16/2007 6/21/2008
5 1 4/28/2007 11/7/2008
6 1 3/16/2007 6/20/2007
7 1 4/18/2007 4/19/2007
8 1 5/11/2007 8/3/2007
9 1 8/1/2007 8/28/2007
9 3 2/21/2008 3/5/2008
10 1 2/6/2008 3/12/2010
11 1 1/31/2007 12/26/2008
11 3 2/11/2009 5/8/2009
11 5 5/19/2009 8/14/2009
11 7 9/22/2009 12/18/2009
11 9 1/9/2010 2/3/2010
11 11 2/23/2010 11/19/2010
11 13 12/1/2010 3/18/2011
11 15 5/10/2011 4/23/2013
11 17 8/3/2013 9/27/2013
;
run;
DATA TEST_DATA1;
SET TEST_DATA1;
BY STUDYID;
FORMAT NEWMONTH MMDDYY10.;
RETAIN NEWMONTH;
IF FIRST.STUDYID THEN NEWMONTH = ENDDATE+365;
IF STARTDATE > NEWMONTH THEN DO;
NEWMONTH = ENDDATE + 365;
END;
RUN;
A gentle warning: not all years have 365 days.
Patrick
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.