Solved: How to separate out and combine dates within a week of eachother for t...

andym90 · Posted 04-15-2022 06:27 PM

Dear all,

Below is an example of the dataset I am working with. I am trying to combine dates within a week of each other per ID. I have tried using lag and have been able to create a flag for records that have dates within a week of each other but am encountering an error when one person can have multiple records with multiple separate dates within a week (If that makes sense). I think I need to somehow create a unique ID again and cluster the data but I am a bit lost.

ID	Date	Test
1	Jan 1, 2005	0
1	Jan 2, 2005	1
2	Jan 1, 2005	0
2	Jan 2, 2005	1
2	Feb 1, 2005	1
2	Feb 3, 2005	1
3	March 2, 2005	1
3	March 3, 2005	1
4	April 10, 2006	1

My goal would be to have the end data (using max of tested and min of date), look like this:

ID	Date	Tested
1	Jan 1, 2005	1
2	Jan 1, 2005	1
2	Feb 1, 2005	1
3	March 2, 2005	1
4	April 10, 2006	1

Thank you for any help or consideration,

mkeintz · Posted 04-18-2022 11:44 PM

LAG is not the way to go for this problem, because - among other reasons - it won't tell you whether the upcoming observation is more than seven days in the future. And you need to know that in order to determine whether you have exhausted your seven-day period. In particular, you need

a way to retain the first date of the 1-week period and the maximum test score over that same week
and
a way to know when to output those retained values, i.e. when the next obs is over a week after the beginning of the week ... or the beginning of a new id.

You'll have to test this below, in the absence of sample data in the form of a working data step:

data want (drop=_:  nxt_:);
  set have (keep=id);
  by id;
  merge have have (firstobs=2 keep=date test rename=(date=nxt_date test=nxt_test));

  retain _temp_test _temp_date ;
  if missing(_temp_test) then _temp_test=test;
  if missing(_temp_date) then _temp_date=date;

  if last.id=0 and nxt_date<=_temp_date+7 then _temp_test=max(_temp_test,nxt_test);
  if last.id=1 or nxt_date>_temp_date+7;
  date=_temp_date;
  test=_temp_test;
  output;
  call missing(of _temp:);
run;

This assumes the data are sorted by ID/DATE.

The _temp... variables are the prospective final values as you scan for data within a week. Notice this program looks for dates within a week of the FIRST date of the period. So if you have observations on the 1st, 4th, and 8th, only the 1st and 4th will be considered. The 8th would start a new "week". If, on the other hand you want to consolidate all observations until there is a gap of over a week, then change

  if last.id=0 and nxt_date<=_temp_date+7 then _temp_test=max(_temp_test,nxt_test);
  if last.id=1 or nxt_date>_temp_date+7;

to

  if last.id=0 and nxt_date<=date+7 then _temp_test=max(_temp_test,nxt_test);
  if last.id=1 or nxt_date>date+7;

The SET statement reads only the ID variable. In tandem with the BY statement, it provides accurate indication of the beginning of an ID (first.id=1) and the end of an id (last.id=1).

The MERGE statement merges each obs with the next obs, providing a way to look ahead at the upcoming date, and if necessary, the upcoming test.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

HB · Posted 04-18-2022 06:33 PM

Are we assuming a week is 7 consecutive days and not one of 52 periods or some number of Sun-Sat periods in a given year or leap year?

What do you want done with cases where an individual has for example a Jan 1, a Jan 5, and a Jan 10 test date? Jan 5 is within one "week" of Jan 1, Jan 10 is within one "week" of Jan 5, but Jan 10 is not within one week of Jan 1.

mkeintz · Posted 04-18-2022 11:44 PM

LAG is not the way to go for this problem, because - among other reasons - it won't tell you whether the upcoming observation is more than seven days in the future. And you need to know that in order to determine whether you have exhausted your seven-day period. In particular, you need

a way to retain the first date of the 1-week period and the maximum test score over that same week
and
a way to know when to output those retained values, i.e. when the next obs is over a week after the beginning of the week ... or the beginning of a new id.

You'll have to test this below, in the absence of sample data in the form of a working data step:

data want (drop=_:  nxt_:);
  set have (keep=id);
  by id;
  merge have have (firstobs=2 keep=date test rename=(date=nxt_date test=nxt_test));

  retain _temp_test _temp_date ;
  if missing(_temp_test) then _temp_test=test;
  if missing(_temp_date) then _temp_date=date;

  if last.id=0 and nxt_date<=_temp_date+7 then _temp_test=max(_temp_test,nxt_test);
  if last.id=1 or nxt_date>_temp_date+7;
  date=_temp_date;
  test=_temp_test;
  output;
  call missing(of _temp:);
run;

This assumes the data are sorted by ID/DATE.

The _temp... variables are the prospective final values as you scan for data within a week. Notice this program looks for dates within a week of the FIRST date of the period. So if you have observations on the 1st, 4th, and 8th, only the 1st and 4th will be considered. The 8th would start a new "week". If, on the other hand you want to consolidate all observations until there is a gap of over a week, then change

  if last.id=0 and nxt_date<=_temp_date+7 then _temp_test=max(_temp_test,nxt_test);
  if last.id=1 or nxt_date>_temp_date+7;

to

  if last.id=0 and nxt_date<=date+7 then _temp_test=max(_temp_test,nxt_test);
  if last.id=1 or nxt_date>date+7;

The SET statement reads only the ID variable. In tandem with the BY statement, it provides accurate indication of the beginning of an ID (first.id=1) and the end of an id (last.id=1).

The MERGE statement merges each obs with the next obs, providing a way to look ahead at the upcoming date, and if necessary, the upcoming test.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

andreas_lds · Posted 04-19-2022 01:50 AM

Not sure if you want 7-days-intervals or weeks. If you actually want weeks, try:

proc summary data=have nway;
   class Id Date;
   var Test;
   format Date weekv5.;
   output out= want(drop= _:) max=;
run;

proc datasets library= work nolist;
   modify want;
      format Date date9.;
quit;

How to separate out and combine dates within a week of eachother for those with the same ID?

Re: How to separate out and combine dates within a week of eachother for those with the same ID?

Re: How to separate out and combine dates within a week of eachother for those with the same ID?

Re: How to separate out and combine dates within a week of eachother for those with the same ID?

Re: How to separate out and combine dates within a week of eachother for those with the same ID?

How to separate out and combine dates within a week of eachother for those with the same ID?

Re: How to separate out and combine dates within a week of eachother for those with the same ID?

Re: How to separate out and combine dates within a week of eachother for those with the same ID?

Re: How to separate out and combine dates within a week of eachother for those with the same ID?

Re: How to separate out and combine dates within a week of eachother for those with the same ID?

Registration is open

SAS Training: Just a Click Away