BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
buechler66
Barite | Level 11

This is a tough one. I don't even know if this is possible.  My requirement is to identify any instance where an ID switches from away from his PI, but later switches back to that same PI within 180 days. There can be any number of PI switches in between the original visit and the return visit.

 

So in the sample data below ID 165074 switches from PI 1699130682 to 1255743092 to PI 1043257041.  But he never switches back to any of the earlier visited PIs.

 

However, ID 176368 switches from PI 1720270135 on THRU_DT 18DEC2017 to PI 1487654356, and later switches back to PI 1720270135 on FROM_DT 11MAY2018.  And since the difference between these two dates is within 180 days I'd want to set an indicator of Recycle_ID=1 for this ID.  The difference in days is calculated as (FROM_DT - THRU_DT of the repeat visits).

 

Does anyone have any ideas on how to accomplish this?  I can't figure out a way to do this.  Any help would be really appreciated.

 

In the example data below I've noted the record that would get the Recycle_Ind set to 1.

 

* Data sorted by ID and FROM_DT

ID     PI         FROM_DT   THRU_DT
165074 1699130682 14SEP2017 12NOV2017
165074 1699130682 13NOV2017 11JAN2018
165074 1255743092 27FEB2018 27APR2018
165074 1255743092 28APR2018 26JUN2018
165074 1043257041 27JUN2018 25AUG2018
176368 1720270135 20OCT2017 18DEC2017
176368 1487654356 12MAR2018 10MAY2018
176368 1720270135 11MAY2018 09JUL2018  <= RECYCLE_IND=1 (11MAY18-18DEC17) IS <= 180
176368 1987654359 10JUL2018 07SEP2018

1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @buechler66 

 

I am sorry it did'nt work, that was because I assumed that intervals were consecutive, which caused gaps between same values of PI to be marked also. The following code marks a recycle only if there has been (at least) one other PI value in between.

 

 

data have;
informat FROM_DT date9. THRU_DT date9.;
format FROM_DT date9. THRU_DT date9.;
input ID PI FROM_DT THRU_DT;
datalines;
123456 1234567890 14SEP2017 12NOV2017
123456 1234567890 15NOV2017 11JAN2018
123456 2345678901 27FEB2018 27APR2018
123456 2345678901 28APR2018 26JUN2018
123456 2345678901 27JUN2018 25AUG2018
234567 3234567890 14SEP2017 12FEB2018
234567 4345678901 20FEB2018 25FEB2018
234567 3234567890 12MAR2018 08MAY2018
234567 3234567890 11MAY2018 12MAY2018
234567 4345678901 13MAY2018 09JUL2018
234567 3234567890 10JUL2018 07SEP2018
;
run;

* Just to make sure input is in concecutive order by ID FROM_DT;
proc sort data=have out=w1;
	by ID FROM_DT;
run;

* Add a sequence number; 
data w2; set w1;
	Seq = _N_;
run;

* sort to group ID's on PI values;
proc sort data=w2 out=w3; 
	by ID PI FROM_DT;
run;

* Pick ID with recurrent PI, not consecutive, but within 180 days;
* Use sequence number to suppress gaps between same PID's;
data w4 (drop=OldPI OldThru OldSeq Seq); set w3; by ID PI;
	OldPI = lag(PI);
	OldThru = lag(THRU_DT);
	OldSeq = lag(Seq);

	if not first.PI and Seq ne (OldSeq+1) then do;
			if OldThru < (FROM_DT-1) and (FROM_DT - OldThru) < 180 then Recycle_ID = 1;
	end;
run;

* Sort to restore input order;
proc sort data=w4 out=want;
	by ID FROM_DT;
run;

 

 

want.gif

 

View solution in original post

21 REPLIES 21
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @buechler66 

 

This is a problem where then LAG function is useful. It just marks one observation, and you will probably need some further processing to get your output. 

 

 

data have;
	informat FROM_DT date9. THRU_DT date9.;
	format FROM_DT date9. THRU_DT date9.;
	input ID PI FROM_DT THRU_DT;
	datalines;
165074 1699130682 14SEP2017 12NOV2017
165074 1699130682 13NOV2017 11JAN2018
165074 1255743092 27FEB2018 27APR2018
165074 1255743092 28APR2018 26JUN2018
165074 1043257041 27JUN2018 25AUG2018
176368 1720270135 20OCT2017 18DEC2017
176368 1487654356 12MAR2018 10MAY2018
176368 1720270135 11MAY2018 09JUL2018
176368 1987654359 10JUL2018 07SEP2018
;
run;


* sort to group ID's on PI values;
proc sort data=have out=want; 
	by ID PI FROM_DT;
run;

* pick ID with recurrent PI, not concecutive, but within 180 days;
data w3 (drop=OldPI OldThru); set w1; by ID PI;
	OldPI = lag(PI);
	OldThru = lag(THRU_DT);
	if not first.PI then do;
		if OldThru < (FROM_DT-1) and (FROM_DT - OldThru) < 180 then Recycle_ID = 1;
	end;
run;

want.gif

 

 

 

buechler66
Barite | Level 11
I will try this out today and reply back. Thank you.
buechler66
Barite | Level 11

Unfortunately this logic didn't work.  I've included new example data and you can see that it's flagging too many rows with Recycle_Ind=1 .  It has to do with this line of code:  if OldThru < (FROM_DT-1) and (FROM_DT - OldThru) < 180 then Recycle_ID = 1;

This assumes that there will only be one day's difference between the current FROM_DT and the previous THRU_DT, when in fact there could be a larger gap. I extended the gap by more than 1 day for the records that are marked as incorrect below.

 

If you have any thoughts on how to correct this I'd love to get this working.

 

For the input data below the following records are identified:

ID     PI         FROM_DT   THRU_DT

123456 1234567890 14SEP2017 12NOV2017
123456 1234567890 15NOV2017 11JAN2018 INCORRECTLY RECYCLE=1
123456 2345678901 27FEB2018 27APR2018
123456 2345678901 28APR2018 26JUN2018
123456 2345678901 27JUN2018 25AUG2018
234567 3234567890 14SEP2017 12FEB2018
234567 4345678901 20FEB2018 25FEB2018
234567 3234567890 12MAR2018 08MAY2018 RECYCLE=1
234567 3234567890 11MAY2018 12MAY2018 INCORRECTLY RECYCLE=1
234567 4345678901 13MAY2018 09JUL2018 RECYCLE=1
234567 3234567890 10JUL2018 07SEP2018 RECYCLE=1

 

 

 

 

 

data have;
informat FROM_DT date9. THRU_DT date9.;
format FROM_DT date9. THRU_DT date9.;
input ID PI FROM_DT THRU_DT;
datalines;
123456 1234567890 14SEP2017 12NOV2017
123456 1234567890 15NOV2017 11JAN2018
123456 2345678901 27FEB2018 27APR2018
123456 2345678901 28APR2018 26JUN2018
123456 2345678901 27JUN2018 25AUG2018
234567 3234567890 14SEP2017 12FEB2018
234567 4345678901 20FEB2018 25FEB2018
234567 3234567890 12MAR2018 08MAY2018
234567 3234567890 11MAY2018 12MAY2018
234567 4345678901 13MAY2018 09JUL2018
234567 3234567890 10JUL2018 07SEP2018
;
run;

 

* sort to group ID's on PI values;
proc sort data=have out=want;
by ID PI FROM_DT;
run;

 

* pick ID with recurrent PI, not concecutive, but within 180 days;
data w3 (drop=OldPI OldThru); set wANT; by ID PI;
OldPI = lag(PI);
OldThru = lag(THRU_DT);
if not first.PI then do;
if OldThru < (FROM_DT-1) and (FROM_DT - OldThru) < 180 then Recycle_ID = 1;
end;
run;

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @buechler66 

 

I am sorry it did'nt work, that was because I assumed that intervals were consecutive, which caused gaps between same values of PI to be marked also. The following code marks a recycle only if there has been (at least) one other PI value in between.

 

 

data have;
informat FROM_DT date9. THRU_DT date9.;
format FROM_DT date9. THRU_DT date9.;
input ID PI FROM_DT THRU_DT;
datalines;
123456 1234567890 14SEP2017 12NOV2017
123456 1234567890 15NOV2017 11JAN2018
123456 2345678901 27FEB2018 27APR2018
123456 2345678901 28APR2018 26JUN2018
123456 2345678901 27JUN2018 25AUG2018
234567 3234567890 14SEP2017 12FEB2018
234567 4345678901 20FEB2018 25FEB2018
234567 3234567890 12MAR2018 08MAY2018
234567 3234567890 11MAY2018 12MAY2018
234567 4345678901 13MAY2018 09JUL2018
234567 3234567890 10JUL2018 07SEP2018
;
run;

* Just to make sure input is in concecutive order by ID FROM_DT;
proc sort data=have out=w1;
	by ID FROM_DT;
run;

* Add a sequence number; 
data w2; set w1;
	Seq = _N_;
run;

* sort to group ID's on PI values;
proc sort data=w2 out=w3; 
	by ID PI FROM_DT;
run;

* Pick ID with recurrent PI, not consecutive, but within 180 days;
* Use sequence number to suppress gaps between same PID's;
data w4 (drop=OldPI OldThru OldSeq Seq); set w3; by ID PI;
	OldPI = lag(PI);
	OldThru = lag(THRU_DT);
	OldSeq = lag(Seq);

	if not first.PI and Seq ne (OldSeq+1) then do;
			if OldThru < (FROM_DT-1) and (FROM_DT - OldThru) < 180 then Recycle_ID = 1;
	end;
run;

* Sort to restore input order;
proc sort data=w4 out=want;
	by ID FROM_DT;
run;

 

 

want.gif

 

buechler66
Barite | Level 11
Wow. This does appear to work. I'm going to do some testing and I'll follow up.

May I ask, why does the code still contain the condition "OLD_THRU < (CLM_FROM_DT-1)"? I'm curious why this condition is still necessary?
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @buechler66 

 

I just forgot to remove it, because it does no harm. It had a purpose in my first attempt, where I thought the intervals were consecutive, so a gap meant that there was an interval with another PI in between. But the check on sequence number makes it obsolete. 

buechler66
Barite | Level 11
Thanks so much for taking the time to help me with this requirement. I'm amazed how you are able to figure this stuff out so quickly. I learned a lot by dissecting your code example. Thank you, thank you, thank you!
PGStats
Opal | Level 21

Not obvious indeed. But here is a shot:

 

data have;
input ID PI (FROM_DT THRU_DT) (:date9.);
format FROM_DT THRU_DT yymmdd10. pi 10.;
datalines;
165074 1699130682 14SEP2017 12NOV2017
165074 1699130682 13NOV2017 11JAN2018
165074 1255743092 27FEB2018 27APR2018
165074 1255743092 28APR2018 26JUN2018
165074 1043257041 27JUN2018 25AUG2018
176368 1720270135 20OCT2017 18DEC2017
176368 1487654356 12MAR2018 10MAY2018
176368 1720270135 11MAY2018 09JUL2018
176368 1987654359 10JUL2018 07SEP2018
;

proc sql;
select 
    *,
    exists (    
        select * 
        from 
            have as b inner join
            have as c 
                on c.id = b.id and c.pi ne b.pi and c.from_dt < b.from_dt
        where   b.id = a.id and 
                b.pi = a.pi and 
                intck("DAY", a.thru_dt, b.from_dt) between 1 and 180 and
                c.thru_dt > a.thru_dt   
        ) as recycle
from have as a;
quit;
                     ID          PI     FROM_DT     THRU_DT   recycle
               ------------------------------------------------------
                 165074  1699130682  2017-09-14  2017-11-12         0
                 165074  1699130682  2017-11-13  2018-01-11         0
                 165074  1255743092  2018-02-27  2018-04-27         0
                 165074  1255743092  2018-04-28  2018-06-26         0
                 165074  1043257041  2018-06-27  2018-08-25         0
                 176368  1720270135  2017-10-20  2017-12-18         1
                 176368  1487654356  2018-03-12  2018-05-10         0
                 176368  1720270135  2018-05-11  2018-07-09         0
                 176368  1987654359  2018-07-10  2018-09-07         0

Would require testing with a complete range of possible cases.

PG
buechler66
Barite | Level 11
This also looks like it will work. I'll try this out today too. Thank you.
buechler66
Barite | Level 11

Hi. Unfortunately this isn't working either when I tested it with this second set of test records.  Here's what I'm expecting to be flagged based on the requirement.  If you run the code below it is flagging differently.  Any suggestions based on this?

 

123456 1234567890 14SEP2017 12NOV2017
123456 1234567890 15NOV2017 11JAN2018
123456 2345678901 27FEB2018 27APR2018
123456 2345678901 28APR2018 26JUN2018
123456 2345678901 27JUN2018 25AUG2018
234567 3234567890 14SEP2017 12FEB2018
234567 4345678901 20FEB2018 25FEB2018
234567 3234567890 12MAR2018 08MAY2018     Recycle_Ind=1
234567 3234567890 11MAY2018 12MAY2018
234567 4345678901 13MAY2018 09JUL2018     Recycle_Ind=1
234567 3234567890 10JUL2018 07SEP2018     Recycle_Ind=1

 

 

 

data have;
input ID PI (FROM_DT THRU_DT) (:date9.);
format FROM_DT THRU_DT yymmdd10. pi 10.;
datalines;
123456 1234567890 14SEP2017 12NOV2017
123456 1234567890 15NOV2017 11JAN2018
123456 2345678901 27FEB2018 27APR2018
123456 2345678901 28APR2018 26JUN2018
123456 2345678901 27JUN2018 25AUG2018
234567 3234567890 14SEP2017 12FEB2018
234567 4345678901 20FEB2018 25FEB2018
234567 3234567890 12MAR2018 08MAY2018
234567 3234567890 11MAY2018 12MAY2018
234567 4345678901 13MAY2018 09JUL2018
234567 3234567890 10JUL2018 07SEP2018
;

proc sql;
select
*,
exists (
select *
from
have as b inner join
have as c
on c.id = b.id and c.pi ne b.pi and c.from_dt < b.from_dt
where b.id = a.id and
b.pi = a.pi and
intck("DAY", a.thru_dt, b.from_dt) between 1 and 180 and
c.thru_dt > a.thru_dt
) as recycle
from have as a;
quit;

PGStats
Opal | Level 21

Please explain

 

123456 1234567890 14SEP2017 12NOV2017
123456 1234567890 15NOV2017 11JAN2018
123456 2345678901 27FEB2018 27APR2018
123456 2345678901 28APR2018 26JUN2018
123456 2345678901 27JUN2018 25AUG2018
234567 3234567890 14SEP2017 12FEB2018
234567 4345678901 20FEB2018 25FEB2018
234567 3234567890 12MAR2018 08MAY2018     Recycle_Ind=1
234567 3234567890 11MAY2018 12MAY2018     WHY? 11MAY2018 - 12FEB2018 < 180 days
234567 4345678901 13MAY2018 09JUL2018     Recycle_Ind=1
234567 3234567890 10JUL2018 07SEP2018     Recycle_Ind=1

PG
buechler66
Barite | Level 11
Hi. It's because in that case the PI was visited on two consecutive visits in a row. The requirement is Recycle_Ind=1 when the ID visits different PI in between visits. Since the ID 234567 visited PI 3234567890 two consecutive times the second record should not get marked as a recycled. Sorry if I wasn't clear on that. I do really appreciate your time in providing a solution. You have helped me so often in the past.
mkeintz
PROC Star

If the data is already sorted by ID/FROM_DT then you only need to save the most recent THRU_DT + 180 for each PI, which can be tested against the incoming FROM_DT for the same PI

 

data have;
	informat FROM_DT date9. THRU_DT date9.;
	format FROM_DT date9. THRU_DT date9.;
	input ID PI FROM_DT THRU_DT;
	datalines;
165074 1699130682 14SEP2017 12NOV2017
165074 1699130682 13NOV2017 11JAN2018
165074 1255743092 27FEB2018 27APR2018
165074 1255743092 28APR2018 26JUN2018
165074 1043257041 27JUN2018 25AUG2018
176368 1720270135 20OCT2017 18DEC2017
176368 1487654356 12MAR2018 10MAY2018
176368 1720270135 11MAY2018 09JUL2018
176368 1987654359 10JUL2018 07SEP2018
run;

data want (drop=_:);
  set have;
  by id pi notsorted;
  if _n_=1 then do;
    _thru_plus_180=.;
    declare hash h ();
      h.definekey('pi');
      h.definedata('_thru_plus_180');
      h.definedone();
  end;
  if h.find()=0 and first.pi=1 and from_dt<=_thru_plus_180 then repeat='Y';
  else repeat=' ';
  if last.pi then h.replace(key:pi,data:thru_dt+180);
run;
  1. The hash object is keyed on the PI values, and the program assumes that no PI is attached to more than one ID.  Otherwise you have to modify the above to avoid contaminating tests for one ID with dates from a prior ID, as here:
    change the h.definekey method to:
         h.definekey('ID','PI');
    and change the h.replace method to
        h.replace(key:id,key:pi,data:thru_dt+180);


  2. The  "SET HAVE;  BY ID PI NOTSORTED;" statements tell sas to expect the data in groups, not neccessarily sorted.  This is simply a tool to generate the dummy variables first.pi and last.pi.  So you can test date comparisons every time a PI group starts, and also update the hash object every time you are at the end of a PI group.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
buechler66
Barite | Level 11
This looks interesting, but I have no idea what it's doing. This is the first time I've ever seen this 'hash' concept. I'm afraid this is over my head. 😞

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1360 views
  • 5 likes
  • 6 in conversation