BookmarkSubscribeRSS Feed
Emma2021
Quartz | Level 8

I have the following data:

data have;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-2 1/1/2021
1 A-1 1/4/2021
1 A-1 1/20/2021
2 A-1 1/1/2021
2 A-1 1/4/2021
3 A-1 1/1/2021
3 A-2 1/4/2021
4 A-2 1/1/2021
4 A-3 1/4/2021
4 A-1 1/5/2021
5 A-2 2/1/2021

;

 

I would like to keep only all observations with 10 or more days difference by ID, but if within 10-days then type should be with smallest value's row (see id=1 and type=A-1; also id=4 and type=A-1) even the date was higher than the previous date, see wanted data:

ID TYPE DATE
1 A-1 1/4/2021
1 A-1 1/20/2021
2 A-1 1/1/2021
3 A-1 1/1/2021
4 A-1 1/5/2021
5 A-2 2/1/2021

 

 

How can I accomplish this? Thank you.

12 REPLIES 12
Kurt_Bremser
Super User

I have two approaches, one using two outputs, the other a look-ahead:

data want;
set have;
by id date;
retain _date _type;
if first.id
then do;
  _type = type;
  _date = date;
end;
_type = ifc(type < _type,type,_type);
if date - _date gt 10
then do;
  _t = type;
  _d = date;
  type = _type;
  date = _date;
  output;
  _type = _t;
  date = _d;
end;
if last.id
then do;
  type = _type;
  output;
end;
_date = date;
drop _:;
run;

data want2;
set have;
by id;
set
  have (
    firstobs=2
    keep=id date
    rename=(
      id = _id
      date = _date
    )
  )
  have (
    obs=1
    keep=id date
    rename=(
      id = _id
      date = _date
    )
  )
;
retain _start _type;
if first.id
then do;
  _start = date;
  _type = type;
end;
_type = ifc(type < _type,type,_type);
if _id ne id or _date gt _start + 10
then do;
  type = _type;
  output;
end;
drop _:;
run;
Emma2021
Quartz | Level 8

The code does not work for below data (when the same type and within 10 days it shifts the date, but I do not want to shift any date):

 

data have;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-2 1/1/2021
1 A-1 1/4/2021
1 A-1 1/20/2021
1 A-1 1/23/2021
1 A-1 1/25/2021
1 A-0 1/25/2021
2 A-1 1/1/2021
2 A-1 1/4/2021
3 A-1 1/1/2021
3 A-2 1/4/2021
4 A-2 1/1/2021
4 A-3 1/4/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
6 0-B 07/22/2020
6 1-B 08/05/2021
6 1-B 08/11/2021
6 1-B 08/18/2021
6 1-B 01/26/2022
;

data wanted;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-1 1/4/2021
1 A-0 1/25/2021
2 A-1 1/1/2021
3 A-1 1/1/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
6 0-B 07/22/2020
6 1-B 08/05/2021
6 1-B 01/26/2022
;

Kurt_Bremser
Super User

Your WANT data is inconsistent.

For the first "window" for ID 1 and 4, you take the last date, but for the windows of ID 2 and 3 you take the first.

 

Please restate your requirements in a complete and consistent way for all variables in the WANT dataset.

Emma2021
Quartz | Level 8
Thank you, but the both codes do not work.
mkeintz
PROC Star

Assuming the data are sorted by ID/DATE, I believe your criterion can be restated as "keep the first record for each ID, and all other records that are more than 10 days after their predecessor".

 

If so, then it's easy:

 

data have;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-2 1/1/2021
1 A-1 1/4/2021
1 A-1 1/20/2021
2 A-1 1/1/2021
2 A-1 1/4/2021
3 A-1 1/1/2021
3 A-2 1/4/2021
4 A-2 1/1/2021
4 A-3 1/4/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
run;

data want;
  set have ;
  by id;

  if first.id=1 or date-10>lag(date);
run;
  

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Emma2021
Quartz | Level 8
Thank you! I should sort by id type and date and I think this should work!
Emma2021
Quartz | Level 8
Sorry it does not work too: see below data:

data have;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-2 1/1/2021
1 A-1 1/4/2021
1 A-1 1/20/2021
1 A-1 1/23/2021
1 A-1 1/25/2021
1 A-0 1/25/2021
2 A-1 1/1/2021
2 A-1 1/4/2021
3 A-1 1/1/2021
3 A-2 1/4/2021
4 A-2 1/1/2021
4 A-3 1/4/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
6 0-B 07/22/2020
6 1-B 08/05/2021
6 1-B 08/11/2021
6 1-B 08/18/2021
6 1-B 01/26/2022
;
data wanted;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-1 1/4/2021
1 A-0 1/25/2021
2 A-1 1/1/2021
3 A-1 1/1/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
6 0-B 07/22/2020
6 1-B 08/05/2021
6 1-B 01/26/2022
mkeintz
PROC Star

Simply telling me "it does not work"   ...   does not work.  Please explain what you got that was unexpected or undesired/  And show the log of the failing program.

 

Also earlier you said

@Emma2021 wrote:
Thank you! I should sort by id type and date and I think this should work!

Why are you sorting by id type and date.   You initial posting asked only for 10-day separation within ID, implying no role for type, and therefore no apparent need for type to be a sort key.    Are you now saying you want 10-day separation within each ID/TYPE?   Is there some other role of the variable type that has not yet been explained (such as choosing between two types within an ID if they have tied dates)?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Emma2021
Quartz | Level 8
It runs but it does not remove the desired rows.
The rule is simple:
Sort by id and date, then remove all rows within 10 days: but before removing all rows within 10-days -check the type —keep only the one row with lowest type value.
mkeintz
PROC Star

Then the task is to read through a time span - i.e. all obs until there is a trailing 10day+ gap.  Over the course of that span determine what is the lowest type.  Then re-read the same obs, and output the first one that has type equal to lowest type.

 

data want (drop=_: );
  /* Read and count obs until a ten-day gap follows, determine lowest type */
  _lowest_type='   ';
  do _i=1 by 1 until (last.id=1 or date+10<_nxt_date);
    set have (keep=id);
    by id;
    merge have have (firstobs=2 keep=date rename=(date=_nxt_date));
    if _lowest_type=' ' then _lowest_type=type;
    else if type<_lowest_type then _lowest_type=type;  
  end;

  /* Reread obs and output first instance with lowest type */
  do _i=_i to 1 by -1;
    set have;
    if type=_lowest_type then do;
      output;
      call missing(_lowest_type); /*Prevent multiple hits */
    end;
  end;  
run;

Note the data need to be in chronological order, (sort by ID/DATE).  You don't need to include TYPE in the sort keys.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Emma2021
Quartz | Level 8
This code does not do anything

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 866 views
  • 3 likes
  • 3 in conversation