Hi everyone,
I'm trying to create a new eligibility file based on eligibility info (Eligible) and 30-day tracking eligibility info.
Further, the "want" data set keeps only eligible records and concatenate the records with less than 30-day gap.
I hope this makes sense. Any programming insight on how to accomplish "want" data set is really appreciated.
Thank you!
Lichee
The following code returns for your sample data what you've asked for.
data want;
infile datalines truncover dsd;
input Person_ID $ Begin_Date :mmddyy10. End_Date :mmddyy10.;
format Begin_Date End_Date mmddyy10.;
datalines;
111111,7/14/2018,7/19/2018
111111,7/26/2018,7/30/2018
111111,11/17/2018,11/26/2018
222222,5/17/2018,5/19/2018
444444,12/21/2018,12/28/2018
555555,4/28/2018,5/3/2018
555555,5/6/2018,5/10/2018
555555,11/26/2018,11/29/2018
666666,1/14/2018,1/24/2018
666666,1/29/2018,1/31/2018
666666,2/4/2018,2/5/2018
666666,2/7/2018,2/12/2018
666666,8/18/2018,8/23/2018
666666,10/3/2018,10/16/2018
666666,11/4/2018,11/5/2018
666666,11/5/2018,11/12/2018
;
run;
data have;
infile datalines truncover dsd;
input Person_ID $ Begin_Date :mmddyy10. End_Date :mmddyy10. Eligible;
format Begin_Date End_Date date9.;
if _n_=1 then
do;
dcl hash h1(dataset:'want');
h1.defineKey('Person_ID', 'Begin_Date', 'End_Date');
h1.defineDone();
end;
want_flg= h1.check()=0;
datalines;
111111,3/13/2018,3/15/2018,0
111111,4/24/2018,4/26/2018,0
111111,7/13/2018,7/14/2018,0
111111,7/14/2018,7/19/2018,1
111111,7/26/2018,7/30/2018,1
111111,11/17/2018,11/26/2018,1
222222,5/1/2018,5/4/2018,0
222222,5/17/2018,5/19/2018,1
333333,3/8/2018,3/9/2018,0
333333,4/17/2018,4/26/2018,0
333333,6/16/2018,6/20/2018,0
333333,7/3/2018,7/4/2018,0
333333,7/23/2018,7/24/2018,0
333333,8/9/2018,8/10/2018,0
333333,9/8/2018,9/11/2018,1
333333,9/11/2018,9/14/2018,1
333333,9/23/2018,9/26/2018,0
444444,7/5/2018,7/7/2018,0
444444,12/21/2018,12/28/2018,1
555555,4/28/2018,5/3/2018,1
555555,5/6/2018,5/10/2018,1
555555,11/26/2018,11/29/2018,1
666666,1/14/2018,1/24/2018,1
666666,1/29/2018,1/31/2018,1
666666,2/4/2018,2/5/2018,1
666666,2/7/2018,2/12/2018,1
666666,8/18/2018,8/23/2018,1
666666,8/24/2018,8/25/2018,1
666666,8/28/2018,9/3/2018,1
666666,9/6/2018,9/11/2018,1
666666,9/13/2018,9/16/2018,1
666666,9/22/2018,9/23/2018,0
666666,10/3/2018,10/16/2018,1
666666,11/4/2018,11/5/2018,1
666666,11/5/2018,11/12/2018,1
666666,12/25/2018,12/28/2018,1
666666,12/30/2018,12/31/2018,0
;
run;
proc sort data=have;
by Person_ID descending Begin_Date;
run;
data derived;
set have;
by Person_ID descending Begin_Date;
length derived_flg check_date 8;
format check_date date9.;
retain check_date;
if first.person_id and Eligible=1 then check_date='31Dec5999'd;
else if Eligible=0 then check_date=Begin_Date;
if Eligible=0 then delete;
diff=check_date-Begin_Date;
derived_flg= check_date-Begin_Date>30;
run;
proc print data=derived;
run;
As far as I understand, your requirement is something like this -
A person is eligible for something, say some offer, only if that person has any of the visits within 30 day gap of another date. Is that correct? If I have understood the requirement correctly, then I would like to share the logic with which you can go ahead. At this point, I will not go into the programming but only the logic of it.
1) Find out eligible records. I guess a person has multiple records and only few records will be eligible. So first find out all the eligible records.
2) Once that is done, now we have to find out eligible persons and not records. If any of the record is eligible for a person, then the person becomes eligible. The person is not eligible only if any of the record of that person is not eligible. Hence find out eligible persons now.
3) Once you have eligible persons, keep only those (eligible) persons in the dataset and remove non eligible persons from the dataset.
Once you agree with the logic of the problem solving, please try to write programme for the same. If you do not agree with my articulation of the problem or if you think that I have not understood it correctly, please let me know it. Once you will start writing the programme with these 3 steps, feel free to let me know if you get stuck up somewhere.
Happy programming!
Thank you.
- Dr. Abhijeet Safai
You're raising two distinct problems here: 1: how to select records, 2: how to collapse the selected records.
For record selection:
That your date ranges can overlap or that one date range can even be contained within another really complicates things. In such a scenario simply ordering the data by begin date and then test against the last not eligible record might not be sufficient for your real data. What for example should happen in below scenario?
person_id | begin_date | end_date | eligible |
1 | 01jan2024 | 31jan2023 | 1 |
1 | 10jan2024 | 15jan2024 | 0 |
Above is eventually not possible in your real data but if it is then you need to amend your sample data so it covers all possible real cases - and then test your selection logic against it before we even get into coding.
For collapsing the selected records:
Also here overlapping date ranges can complicate things. Try to provide us with sample data that cover all possible cases for selected records to be collapsed. Especially provide sample data with more multiple records that need to get collapsed into a single record (including a record where a data range is fully contained within another date range so that the records don't sort "properly" by begin date).
Below code returns your WANT table but because of assumed missing cases I don't believe it would work for your real data.
proc sort data=have;
by person_id descending Begin_Date;
run;
/* select records */
data selected_records;
set have;
by person_id descending begin_date;
retain check_date;
if first.person_id then check_date='01jan1960'd;
if eligible=0 then
do;
check_date=End_Date;
delete;
end;
if begin_date-check_date<=30 then delete;
run;
Thanks a lot! Your code helps me almost get there. One scenario somehow got dropped although I expect that to be kept. I just added Person_ID 66666 to HAVE data, where I expected the first four records (1/14/2018-2/12/2018) to be kept because there was no records within 30 days to have Eligible=0.
How many observations are you dealing with?There are some approaches that work with overlaps that would create one observation for each date in the interval but that may not be practical with large data sets.
Also, you may be using some terminology in the description of your problem that we have to guess about.
For example you use "follow-up record". Does this mean anything other than a later date? "Follow-up" might be implying something, such as a specific follow-up for something that only occurs on SOME of the observations.
This demonstrates getting individual visit dates instead of intervals and sorting:
data maybeuseful; set have; do visitdate = Begin_Date to End_Date; output; end; format visitdate mmddyy10.; run; proc sort data=maybeuseful; by Person_ID Visitdate begin_date; run;
It is going to be a bit easier to deal with continuous, or within stated period of previous visit with this.
I'm pretty sure I don't understand the "eligible" bits so I have not done anything along those lines.
But I suspect the above data set will be easier to "collapse".
I cannot understand what you HAVE already and what your WANT to calculate.
Do you only have the date ranges? Or do you also have the ELIGIBLE flag? Or do you also already have the other two flags? And if you already have the flags then what are you trying to calculate?
What is a FOLLOW_UP record? Do you mean another observation in your HAVE dataset? Or perhaps only a record with a specific value of the ELIGIBLE flag?
What is a Track_30_day? Do mean exactly 30 days? Or within as most 30 days? From which of the dates? Only the END_DATE? So a record that ended on Jan 1st that did not have another record until after Jan 31st does not have a "track_30_day"? What about the situation where the record ended on Jan 1st and the next record ended before Jan 25th? Would that count as "track_30_day"? Perhaps not since at 30 days there is no record? Or do you mean that now you want to extend the 30 day window from Jan 25th instead of from Jan 1st (so that you can build longer intervals of "eligibility" from multiple shorter intervals.) Do the intervals that are merged have to all have the same value of ELIGIBLE (assuming that was part of your input and not something to be calculated.)
@lichee I'm still struggling to understand your selection logic. Like: If WITHIN 30 days means Before and After.
Given that record selection and collapsing records are two different problems and that record selection needs resolution first, I suggest that you provide us with a HAVE data set that marks all the records that you want selected.
I also suggest that you use date format in the data and your explanations that's international. MDY formats are really hard to read for most non-US people.
Below the sample data structure I'd suggest. You will need to amend the data for the last column select_desired_flg
data have;
infile datalines truncover dsd;
input Person_ID $ Begin_Date :date9. End_Date :date9. Eligible select_desired_flg;
format Begin_Date End_Date date9.;
datalines;
111111,13MAR2018,15MAR2018,0,0
111111,24APR2018,26APR2018,0,0
111111,13JUL2018,14JUL2018,0,0
111111,14JUL2018,19JUL2018,1,1
111111,26JUL2018,30JUL2018,1,1
111111,17NOV2018,26NOV2018,1,1
222222,01MAY2018,04MAY2018,0,0
222222,17MAY2018,19MAY2018,1,1
333333,08MAR2018,09MAR2018,0,0
333333,17APR2018,26APR2018,0,0
333333,16JUN2018,20JUN2018,0,0
333333,03JUL2018,04JUL2018,0,0
333333,23JUL2018,24JUL2018,0,0
333333,09AUG2018,10AUG2018,0,0
333333,08SEP2018,11SEP2018,1,0
333333,11SEP2018,14SEP2018,1,0
333333,23SEP2018,26SEP2018,0,0
444444,05JUL2018,07JUL2018,0,0
444444,21DEC2018,28DEC2018,1,1
555555,28APR2018,03MAY2018,1,1
555555,06MAY2018,10MAY2018,1,0
555555,26NOV2018,29NOV2018,1,1
666666,14JAN2018,24JAN2018,1,0
666666,29JAN2018,31JAN2018,1,0
666666,04FEB2018,05FEB2018,1,0
666666,07FEB2018,12FEB2018,1,0
666666,18AUG2018,23AUG2018,1,0
666666,24AUG2018,25AUG2018,1,0
666666,28AUG2018,03SEP2018,1,0
666666,06SEP2018,11SEP2018,1,0
666666,13SEP2018,16SEP2018,1,0
666666,22SEP2018,23SEP2018,0,0
666666,03OCT2018,16OCT2018,1,0
666666,04NOV2018,05NOV2018,1,0
666666,05NOV2018,12NOV2018,1,0
666666,25DEC2018,28DEC2018,1,0
666666,30DEC2018,31DEC2018,0,0
;
run;
I also suggest that you provide us with a re-formulated set of your selection logic. Given the answers you've got so far I assume that I'm not the only one struggling with your narrative.
Sorry for all the confusion! Below is the updated HAVE (data I have) and WANT (target data I want). Essentially, I wanted to keep eligible records. But if a record with any ineligible record within 30 days for the same Person_ID, the record should be dropped.
data want;
infile datalines truncover dsd;
input Person_ID $ Begin_Date :mmddyy10. End_Date :mmddyy10.;
format Begin_Date End_Date mmddyy10.;
datalines;
111111,7/14/2018,7/19/2018
111111,7/26/2018,7/30/2018
111111,11/17/2018,11/26/2018
222222,5/17/2018,5/19/2018
444444,12/21/2018,12/28/2018
555555,4/28/2018,5/3/2018
555555,5/6/2018,5/10/2018
555555,11/26/2018,11/29/2018
666666,1/14/2018,1/24/2018
666666,1/29/2018,1/31/2018
666666,2/4/2018,2/5/2018
666666,2/7/2018,2/12/2018
666666,8/18/2018,8/23/2018
666666,10/3/2018,10/16/2018
666666,11/4/2018,11/5/2018
666666,11/5/2018,11/12/2018
;
run;
But if a record with any ineligible record within 30 days for the same Person_ID, the record should be dropped.
WITHIN means Before And After?
Why is below record WANT....
111111,7/14/2018,7/19/2018
...given your HAVE data also contains below record with eligible=0?
111111,7/13/2018,7/14/2018,0
The following code returns for your sample data what you've asked for.
data want;
infile datalines truncover dsd;
input Person_ID $ Begin_Date :mmddyy10. End_Date :mmddyy10.;
format Begin_Date End_Date mmddyy10.;
datalines;
111111,7/14/2018,7/19/2018
111111,7/26/2018,7/30/2018
111111,11/17/2018,11/26/2018
222222,5/17/2018,5/19/2018
444444,12/21/2018,12/28/2018
555555,4/28/2018,5/3/2018
555555,5/6/2018,5/10/2018
555555,11/26/2018,11/29/2018
666666,1/14/2018,1/24/2018
666666,1/29/2018,1/31/2018
666666,2/4/2018,2/5/2018
666666,2/7/2018,2/12/2018
666666,8/18/2018,8/23/2018
666666,10/3/2018,10/16/2018
666666,11/4/2018,11/5/2018
666666,11/5/2018,11/12/2018
;
run;
data have;
infile datalines truncover dsd;
input Person_ID $ Begin_Date :mmddyy10. End_Date :mmddyy10. Eligible;
format Begin_Date End_Date date9.;
if _n_=1 then
do;
dcl hash h1(dataset:'want');
h1.defineKey('Person_ID', 'Begin_Date', 'End_Date');
h1.defineDone();
end;
want_flg= h1.check()=0;
datalines;
111111,3/13/2018,3/15/2018,0
111111,4/24/2018,4/26/2018,0
111111,7/13/2018,7/14/2018,0
111111,7/14/2018,7/19/2018,1
111111,7/26/2018,7/30/2018,1
111111,11/17/2018,11/26/2018,1
222222,5/1/2018,5/4/2018,0
222222,5/17/2018,5/19/2018,1
333333,3/8/2018,3/9/2018,0
333333,4/17/2018,4/26/2018,0
333333,6/16/2018,6/20/2018,0
333333,7/3/2018,7/4/2018,0
333333,7/23/2018,7/24/2018,0
333333,8/9/2018,8/10/2018,0
333333,9/8/2018,9/11/2018,1
333333,9/11/2018,9/14/2018,1
333333,9/23/2018,9/26/2018,0
444444,7/5/2018,7/7/2018,0
444444,12/21/2018,12/28/2018,1
555555,4/28/2018,5/3/2018,1
555555,5/6/2018,5/10/2018,1
555555,11/26/2018,11/29/2018,1
666666,1/14/2018,1/24/2018,1
666666,1/29/2018,1/31/2018,1
666666,2/4/2018,2/5/2018,1
666666,2/7/2018,2/12/2018,1
666666,8/18/2018,8/23/2018,1
666666,8/24/2018,8/25/2018,1
666666,8/28/2018,9/3/2018,1
666666,9/6/2018,9/11/2018,1
666666,9/13/2018,9/16/2018,1
666666,9/22/2018,9/23/2018,0
666666,10/3/2018,10/16/2018,1
666666,11/4/2018,11/5/2018,1
666666,11/5/2018,11/12/2018,1
666666,12/25/2018,12/28/2018,1
666666,12/30/2018,12/31/2018,0
;
run;
proc sort data=have;
by Person_ID descending Begin_Date;
run;
data derived;
set have;
by Person_ID descending Begin_Date;
length derived_flg check_date 8;
format check_date date9.;
retain check_date;
if first.person_id and Eligible=1 then check_date='31Dec5999'd;
else if Eligible=0 then check_date=Begin_Date;
if Eligible=0 then delete;
diff=check_date-Begin_Date;
derived_flg= check_date-Begin_Date>30;
run;
proc print data=derived;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.