DATA Step, Macro, Functions and more

How to re-group/organize row observations based on dates in certain rows?

Accepted Solution Solved
Reply
Contributor
Posts: 48
Accepted Solution

How to re-group/organize row observations based on dates in certain rows?

Hello,

 

I have a database where subjects have repeated measurements. These measurements are taken during specific time periods delineated by StartDate and EndDate1. Sample code below:

 

data tempfile;

infile datalines truncover;

INPUT

StudyID$2.@+1 count StartDate mmddyy10.@+1 EndDate1 mmddyy10.@+1;

format Startdate EndDate1 mmddyy10.;

datalines;

01 1 02/01/2007 05/01/2008 

02 1 06/02/2012 05/24/2013 

03 1 05/07/2010 08/19/2010 

04 1 06/16/2007 06/21/2008 

05 1 04/28/2007 11/07/2008 

06 1 03/16/2007 06/20/2007 

07 1 04/18/2007 04/19/2007 

08 1 05/11/2007 08/03/2007 

09 1 08/01/2007 08/28/2007 

09 3 02/21/2008 03/05/2008 

10 1 02/06/2008 03/12/2010 

11 1 01/31/2007 12/26/2008 

11 3 02/11/2009 05/08/2009 

11 5 05/19/2009 08/14/2009 

11 7 09/22/2009 12/18/2009 

11 9 01/09/2010 02/03/2010 

11 11 02/23/2010 11/19/2010 

11 13 12/01/2010 03/18/2011 

11 15 05/10/2011 04/23/2013 

11 17 08/03/2013 09/27/2013 

;

run;

proc print data=tempfile; run;

 

Please note that the count variable is left over from a previous data manipulation, which is why there are only odd values instead of sequential numbers (1,3,5,... instead of 1,2,3,...).

 

I want to further group these measurements into 12-month time periods that start on EndDate1; basically I am interested in following individuals for 12 months after EndDate1. The issue is that I am not sure how to write a code that will allow me to reorganize the data this way. I was thinking about trying to create something like this:

 

StudyID

count

StartDate

EndDate1

Date_12mopost

Period

01

1

02/01/2007

05/01/2008

2009-05-01

1

02

1

06/02/2012

05/24/2013

2014-05-24

1

03

1

05/07/2010

08/19/2010

2011-08-19

1

04

1

06/16/2007

06/21/2008

2009-06-21

1

05

1

04/28/2007

11/07/2008

2009-11-07

1

06

1

03/16/2007

06/20/2007

2008-06-19

1

07

1

04/18/2007

04/19/2007

2008-04-18

1

08

1

05/11/2007

08/03/2007

2008-08-02

1

09

1

08/01/2007

08/28/2007

2008-08-27

1

09

3

02/21/2008

03/05/2008

2008-08-27

1

10

1

02/06/2008

03/12/2010

2011-03-12

1

11

1

01/31/2007

12/26/2008

2009-12-26

1

11

3

02/11/2009

05/08/2009

2009-12-26

1

11

5

05/19/2009

08/14/2009

2009-12-26

1

11

7

09/22/2009

12/18/2009

2009-12-26

1

11

9

01/09/2010

02/03/2010

2011-02-03

2 (starting over)

11

11

02/23/2010

11/19/2010

2011-02-03

2

11

13

12/01/2010

03/18/2011

2012-03-18

3 (starting over)

11

15

05/10/2011

04/23/2013

2014-04-23

4 (starting over)

11

17

08/03/2013

09/27/2013

2014-04-23

4

 

This will allow me to do what I need to do with the data to eventually be able to analyze it (for example, eventually transposing the data into the appropriate one-year time period, or selecting the first row of the one-year time period). So I guess my questions might be:

  • How do I assign a date 12 months after the EndDate1 IN THE FIRST ROW FOR EACH SUBJECT (e.g. Subject 09)?
  • How about later on, for repeat measurements, if the EndDate1 takes place AFTER the 12-month mark, and I want to start the clock over again with a new 12 month period (e.g. Subject 11)?
  • Can a create a new “Period” variable to reflect these dates, to make it easier to keep the 12-month time periods organized?

 

So far, I have only managed to create a semi-accurate date variable representing 12 months after EndDate1, using the following code:

 

data tempfile;

  set tempfile;

  by studyID count;

  retain Date12mopost;

  if first.studyID then do;

     Date_12mopost=(EndDate1+365);

  end;

  else do;

     Date_12mopost=Date_12mopost;

  end;

  format Date_12mopost yymmdd10. ;

run;

proc print data = tempfile; run;

 

StudyID

count

StartDate

EndDate1

Date_12mopost

01

1

02/01/2007

05/01/2008

2009-05-01

02

1

06/02/2012

05/24/2013

2014-05-24

03

1

05/07/2010

08/19/2010

2011-08-19

04

1

06/16/2007

06/21/2008

2009-06-21

05

1

04/28/2007

11/07/2008

2009-11-07

06

1

03/16/2007

06/20/2007

2008-06-19

07

1

04/18/2007

04/19/2007

2008-04-18

08

1

05/11/2007

08/03/2007

2008-08-02

09

1

08/01/2007

08/28/2007

2008-08-27

09

3

02/21/2008

03/05/2008

2009-03-05

10

1

02/06/2008

03/12/2010

2011-03-12

11

1

01/31/2007

12/26/2008

2009-12-26

11

3

02/11/2009

05/08/2009

2010-05-08

11

5

05/19/2009

08/14/2009

2010-08-14

11

7

09/22/2009

12/18/2009

2010-12-18

11

9

01/09/2010

02/03/2010

2011-02-03

11

11

02/23/2010

11/19/2010

2011-11-19

11

13

12/01/2010

03/18/2011

2012-03-17

11

15

05/10/2011

04/23/2013

2014-04-23

11

17

08/03/2013

09/27/2013

2014-09-27

 

The Date_12mopost is accurate for each row, but doesn’t help me try to group the periods into the a 12-month time period based on the dates from the preceding row.

 

I recognize this is a multi-step question that is likely extremely complicated. Any guidance, particularly regarding the specifics of the code, would be greatly appreciated.

 

Thank you very much in advance!


Accepted Solutions
Solution
‎05-02-2017 01:35 PM
PROC Star
Posts: 172

Re: How to re-group/organize row observations based on dates in certain rows?

data want;

set tempfile;

by   StudyID;

retain date_12mopost period;

if first.StudyID and last.studyid then do;

date_12mopost= intnx('year', enddate1, 1, 's');

period=1;

end;

else if first.StudyID and not last.studyid then do;

date_12mopost=intnx('year', enddate1, 1, 's');

period=1;

end;

else if not first.studyid then do;

if   enddate1>date_12mopost then do;

date_12mopost=intnx('year', enddate1, 1, 's');

period+1;

end;

end;

format     date_12mopost mmddyy10.;

run;

 

Regards,

Naveen Srinivasan

View solution in original post


All Replies
Super User
Posts: 5,085

Re: How to re-group/organize row observations based on dates in certain rows?

Believe it or not, you're not that far away from working code.  First, review this section:

 

  if first.studyID then do;

     Date_12mopost=(EndDate1+365);

  end;

 

You could consider that formula OK, or you could experiment with:

 

date_12mopost = intnx('month', EndDate1, 12, 'same');

 

Second, add to that section:

 

  if first.studyID then do;

     Date_12mopost=(EndDate1+365);

     period=1;

  end;

 

Both variables are going to be retained, so there is no need for this code (remove it):

 

  else do;

     Date_12mopost=Date_12mopost;

  end;

 

Finally, to increment period (and retain it at the same time):

 

if StartDate > Date12mo_post then do;

   period + 1;  /* which will also retain PERIOD */

   Date12mo_post = EndDate1 + 365;  /* or switch formulas */

end;

Solution
‎05-02-2017 01:35 PM
PROC Star
Posts: 172

Re: How to re-group/organize row observations based on dates in certain rows?

data want;

set tempfile;

by   StudyID;

retain date_12mopost period;

if first.StudyID and last.studyid then do;

date_12mopost= intnx('year', enddate1, 1, 's');

period=1;

end;

else if first.StudyID and not last.studyid then do;

date_12mopost=intnx('year', enddate1, 1, 's');

period=1;

end;

else if not first.studyid then do;

if   enddate1>date_12mopost then do;

date_12mopost=intnx('year', enddate1, 1, 's');

period+1;

end;

end;

format     date_12mopost mmddyy10.;

run;

 

Regards,

Naveen Srinivasan

Frequent Contributor
Posts: 93

Re: How to re-group/organize row observations based on dates in certain rows?

I believe that this is what you are looking for, if I understand it correctly:

 

DATA test_data1; 
informat startdate mmddyy10. enddate mmddyy10. ;
input studyid count startdate $ enddate $;  
format startdate enddate mmddyy10.; 
datalines;
1	1	2/1/2007	5/1/2008
2	1	6/2/2012	5/24/2013
3	1	5/7/2010	8/19/2010
4	1	6/16/2007	6/21/2008
5	1	4/28/2007	11/7/2008
6	1	3/16/2007	6/20/2007
7	1	4/18/2007	4/19/2007
8	1	5/11/2007	8/3/2007
9	1	8/1/2007	8/28/2007
9	3	2/21/2008	3/5/2008
10	1	2/6/2008	3/12/2010
11	1	1/31/2007	12/26/2008
11	3	2/11/2009	5/8/2009
11	5	5/19/2009	8/14/2009
11	7	9/22/2009	12/18/2009
11	9	1/9/2010	2/3/2010
11	11	2/23/2010	11/19/2010
11	13	12/1/2010	3/18/2011
11	15	5/10/2011	4/23/2013
11	17	8/3/2013	9/27/2013
;
run;

DATA TEST_DATA1;
SET TEST_DATA1;
BY STUDYID;
FORMAT NEWMONTH MMDDYY10.;
RETAIN NEWMONTH;
IF FIRST.STUDYID THEN NEWMONTH = ENDDATE+365;
IF STARTDATE > NEWMONTH THEN DO; 
NEWMONTH = ENDDATE + 365;
END;
RUN;

 

A gentle warning: not all years have 365 days.

 

Patrick

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 137 views
  • 0 likes
  • 4 in conversation