BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Aidaan_10
Calcite | Level 5

Hi,  

        I have attached some example input and output datasets.

So here the Start_date for ids's done after the first one must be later than the previous_date. If Start_date for the current record is before or equal to the previous previous_date,  I need to remove the record. Then I need to create a sequence number for each record within ID, where Counter resets for each new ID. 

After the first record I need to look to the previous date from second record onwards and create a sequence. 

 

Taking EFG ID as an example : the record with start_date 10DEC2021 wont be an eligible record because the previous_date
for previous record is 12dec2021 so the current records start date(10dec2021) is before the previous records previous date thus this record is excluded.
And for 13DEC2021 startdate record it will be eligible record whereas 16DEC2021 wont be eligible record as the date of previous eligible record previous_date(16DEC2021) is equal to this current records start_date.
I tried using lag and counting the rows but with lag its an lengthy procedure to count all previous rows.

 

Input dataset:

ID      Start_date    Previous_date

ABC 03AUG2020 08AUG2020
ABC 11AUG2020 12AUG2020
ABC 03NOV2020 05NOV2020
ABC 09NOV2020 16FEB2021
ABC 20NOV2020 24NOV2020
ABC 07DEC2020 13DEC2020
ABC 17DEC2020 24DEC2020
ABC 25JAN2021 29JAN2021
ABC 24FEB2021 15MAR2021
ABC 18MAR2021 22MAR2021
ABC 12APR2021 15APR2021
ABC 14APR2021 16APR2021
ABC 20APR2021 28APR2021
ABC 19MAY2021 26MAY2021
EFG 08MAR2021 10MAR2021
EFG 08APR2021 15APR2021
EFG 30APR2021 06MAY2021
EFG 18MAY2021 25MAY2021
EFG 20AUG2021 23AUG2021
EFG 08SEP2021 10SEP2021
EFG 06OCT2021 07OCT2021
EFG 22NOV2021 23NOV2021
EFG 07DEC2021 12DEC2021
EFG 10DEC2021 13DEC2021
EFG 13DEC2021 16DEC2021
EFG 16DEC2021 21DEC2021
;

 

The output should be 

ID     Start_date        Previous_date  Seq

ABC 03AUG2020 08AUG2020 1
ABC 11AUG2020 12AUG2020 2
ABC 03NOV2020 05NOV2020 3
ABC 09NOV2020 16FEB2021 4
ABC 24FEB2021 15MAR2021 5
ABC 18MAR2021 22MAR2021 6
ABC 12APR2021 15APR2021 7
ABC 20APR2021 28APR2021 8
ABC 19MAY2021 26MAY2021 9
EFG 08MAR2021 10MAR2021 1
EFG 08APR2021 15APR2021 2
EFG 30APR2021 06MAY2021 3
EFG 18MAY2021 25MAY2021 4
EFG 20AUG2021 23AUG2021 5
EFG 08SEP2021 10SEP2021 6
EFG 06OCT2021 07OCT2021 7
EFG 22NOV2021 23NOV2021 8
EFG 07DEC2021 12DEC2021 9
EFG 13DEC2021 16DEC2021 10
;

Any help is greatly appreciated. Thanks In Advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Aidaan_10 If I understand the required logic right then below code should return what you're asking for.

data want;
  set have;
  by id start_date;
  retain _l_previous_date;
  format _l_previous_date date9.;

  if not first.id then
    do;
      if (start_date-_l_previous_date)<=0 then delete;
    end;
  _l_previous_date=previous_date;
  drop _l_previous_date;
run;

Patrick_0-1642210657187.png

 

You haven't asked for it so just guessing: Wouldn't you also need to amend the Previous_Date value? Like for 24FEB2021 the Previous_Date value would become the max value derived from the deleted rows (=29JAN2021).

View solution in original post

7 REPLIES 7
Reeza
Super User
Please include what you've tried with Lags etc.
Aidaan_10
Calcite | Level 5
I tried to get the prior previous_Date from the valid eligible record.
ballardw
Super User

Here is how to provide the example data as a date step so we can use that data.

data have;
   input ID $ Start_date :Date9. Previous_date :Date9.;
   format start_date previous_date date9.;
datalines;
ABC 03AUG2020 08AUG2020
ABC 11AUG2020 12AUG2020
ABC 03NOV2020 05NOV2020
ABC 09NOV2020 16FEB2021
ABC 20NOV2020 24NOV2020
ABC 07DEC2020 13DEC2020
ABC 17DEC2020 24DEC2020
ABC 25JAN2021 29JAN2021
ABC 24FEB2021 15MAR2021
ABC 18MAR2021 22MAR2021
ABC 12APR2021 15APR2021
ABC 14APR2021 16APR2021
ABC 20APR2021 28APR2021
ABC 19MAY2021 26MAY2021
EFG 08MAR2021 10MAR2021
EFG 08APR2021 15APR2021
EFG 30APR2021 06MAY2021
EFG 18MAY2021 25MAY2021
EFG 20AUG2021 23AUG2021
EFG 08SEP2021 10SEP2021
EFG 06OCT2021 07OCT2021
EFG 22NOV2021 23NOV2021
EFG 07DEC2021 12DEC2021
EFG 10DEC2021 13DEC2021
EFG 13DEC2021 16DEC2021
EFG 16DEC2021 21DEC2021
;

Now you need to explain why in these records the 4 following the 09NOV2020 start_date are removed. Yes 20NOV2020 is before 16FEF2021 but 07DEC2020 is not before 24NOV2020. So your actual output is apparently using a rule that is much more than looking at the record immediately before, which is much more complex than your description.

ABC 09NOV2020 16FEB2021
ABC 20NOV2020 24NOV2020
ABC 07DEC2020 13DEC2020
ABC 17DEC2020 24DEC2020
ABC 25JAN2021 29JAN2021
Aidaan_10
Calcite | Level 5
Apologies for posting the data in that manner. And Yes, you are right the logic is too complex then mentioned above the 16FEB2021 previous date record is the eligible record and the records after that are ineligible and the next eligible record will be the 24FEB2021 so basically we are not looking for just previous record date but we are looking for the eligible records previous date for next current record.
Patrick
Opal | Level 21

@Aidaan_10 If I understand the required logic right then below code should return what you're asking for.

data want;
  set have;
  by id start_date;
  retain _l_previous_date;
  format _l_previous_date date9.;

  if not first.id then
    do;
      if (start_date-_l_previous_date)<=0 then delete;
    end;
  _l_previous_date=previous_date;
  drop _l_previous_date;
run;

Patrick_0-1642210657187.png

 

You haven't asked for it so just guessing: Wouldn't you also need to amend the Previous_Date value? Like for 24FEB2021 the Previous_Date value would become the max value derived from the deleted rows (=29JAN2021).

Aidaan_10
Calcite | Level 5
Thanks a ton Patrick, this is what I wanted. I was trying to achieve this via so many lag statements and flagging variables and all but this is great. Thanks again.
Tom
Super User Tom
Super User

So your START_DATE and PREVIOUS_DATE variables really represent START and END dates (AKA an interval)?  And your goal is to collapse the overlapping/contiguous intervals into a single interval?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 628 views
  • 4 likes
  • 5 in conversation