Solved: Re: identifying new fills

sas33 · Posted 05-02-2019 03:16 PM

Hi

i have following data set and i would like to create output as shown below. i need to output the records if difference between current fill date and last fill dt is > 120 with in id.

data have;
infile datalines dsd truncover;
input id $10. fill_dt:mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

Output:

1000 08/21/2018
2000 02/02/2018
2000 11/07/2018
3000 03/05/2018
3000 03/05/2018
3000 10/24/2018

AMSAS · Posted 05-02-2019 03:55 PM

This can be achieved with BY GROUP processing and the lag function

Here's an example, although you'll need to adjust it slightly to account for the two 3000 Id records on 03/05/2018 as the example only outputs a single observation and you indicated you wanted 2

data have;
infile datalines ;
input id $ fill_dt mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

proc sort data=have out=srtd ;
	by id fill_dt ;
run ;

data want ;
	set srtd ;
	by id fill_dt ;
	lg=lag1(fill_dt) ;
	lg120=lg+120 ;	
	put id " - " fill_dt date7. " - " lg date7. " - " lg120 date7. ;
	if not(first.id) then do ;
		if lg120<fill_dt then
			output ;
	end ;
	else 
		output ;
run ;

View solution in original post

novinosrin · Posted 05-02-2019 03:27 PM

HI @sas33 Can you please explain this "i need to output the records if difference between current fill date and last fill dt is > 120 with in id." with the data? I am not able to comprehend clearly.

sas33 · Posted 05-02-2019 05:34 PM

here is the explanation how i wanted. if there are more fills after on a particular after 120 days since last fill day, i need to show all those records in output.

id	fill_dt	Days_since_last_fill	Qualifies for output	Comment
1000	08/21/2018		Y	First Fill Day
2000	02/02/2018		Y	First Fill Day
2000	02/17/2018	15	N	Days Difference between Last Fill date and current fill date is <120
2000	11/07/2018	263	Y	> 120 days from previous fill
2000	11/12/2018	5	N	Days Difference between Last Fill date and current fill date is lt 120
2000	11/28/2018	16	N	Days Difference between Last Fill date and current fill date is < 120
3000	03/05/2018		Y	First Fill Day
3000	03/05/2018		Y	First Fill Day
3000	05/29/2018	85	N	Days Difference between Last Fill date and current fill date is < 120
3000	10/24/2018	148	Y	> 120 days from previous fill
3000	11/24/2018	31	N	Days Difference between Last Fill date and current fill date is < 120

novinosrin · Posted 05-02-2019 05:49 PM

Hi @sas33 Thanks!

data have;
infile datalines  truncover;
input id : $10. fill_dt :mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

data want;
set have;
by id;
retain d;
k=dif(fill_dt);
if first.id then do;
call missing(k);
d=fill_dt;
end;
if fill_dt=d or k>120 then output;
drop d k;
run;

sas33 · Posted 05-02-2019 06:21 PM

Thanks you novinosrin! This is very close.

If i have additional record in my input as below for id 3000, i am seeing only one record. i want to see both of them.

3000 10/24/2018

novinosrin · Posted 05-02-2019 07:34 PM

Hi @sas33 Try

data have;
infile datalines  truncover;
input id : $10. fill_dt :mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 10/24/2018
3000 11/24/2018
;
run;

data want;
set have;
by id fill_dt;
retain d f;
k=dif(fill_dt);
if first.id then do;
call missing(k);
d=fill_dt;
end;
if first.fill_dt then f=.;
if fill_dt=d or k>120 then f=1;
if f;
drop d k f;
run;

AMSAS · Posted 05-02-2019 03:55 PM

This can be achieved with BY GROUP processing and the lag function

Here's an example, although you'll need to adjust it slightly to account for the two 3000 Id records on 03/05/2018 as the example only outputs a single observation and you indicated you wanted 2

data have;
infile datalines ;
input id $ fill_dt mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

proc sort data=have out=srtd ;
	by id fill_dt ;
run ;

data want ;
	set srtd ;
	by id fill_dt ;
	lg=lag1(fill_dt) ;
	lg120=lg+120 ;	
	put id " - " fill_dt date7. " - " lg date7. " - " lg120 date7. ;
	if not(first.id) then do ;
		if lg120<fill_dt then
			output ;
	end ;
	else 
		output ;
run ;

ballardw · Posted 05-02-2019 05:01 PM

And another similar:

/* assumes HAVE is sorted by ID and fill_dt*/
data want;
   set have;
   by id fill_dt;
   retain prevdt;
   if first.id  then do;
     output;
     prevdt = fill_dt;
   end;
   else if fill_dt- prevdt>120 then do;
      output;
     prevdt = fill_dt;
   end;
   else prevdt=fill_dt;
   drop prevdt;
run;

But need clarification about the duplicate id fill_dt.

ScottBass · Posted 05-02-2019 07:30 PM

Just as in life, SAS is good at "knowing the past" (lag function for previous record(s)), but not as good as "seeing the future" (reading ahead to the next row and doing something with the current row).

For your issue, the lag function (or retain of a temporary variable to which you assign the fill_dt) will work fine.

With that in mind, below is a design pattern I often use, especially when I need to "look ahead" to the next row(s) and do something with the current row. It takes advantage of a surrogate key (record number) + SQL's Cartesian product to either "look behind" or "look ahead" to previous record(s) or next record(s). This approach assumes the source data is already sorted in keys (id) and chronological (fill_dt) order.

As @ballardw says, you don't make clear why the duplicate (2nd) row for id=3000 constitutes a "first fill"? How can you have two "firsts" in your id group?

Anyway, with the "have" and "want" you provided, this gives your desired result.

Whether you choose to use this approach, IMO this is a handy design pattern to have in your kit bag, and I encourage you to understand how this works.

HTH...

data have;
   input id $4. fill_dt:mmddyy10.;
   format fill_dt MMDDYY10.;
   datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

* create surrogate key ;
data sk / view=sk;
   sk+1;
   set have;
run;

* use SQL + Cartesian product to look behind (or ahead) ;
proc sql;
   create table test as
   select 
      a.id
     ,b.fill_dt as prev_fill_dt
     ,a.fill_dt as curr_fill_dt
   from
      sk a
   left join
      sk b
   on
      a.id = b.id
      and
      a.sk = b.sk+1
   ;
quit;

* final approach ;
proc sql;
   create table want as
   select 
      a.id
     ,b.fill_dt as prev_fill_dt
     ,a.fill_dt as curr_fill_dt
     ,intck('day',b.fill_dt,curr_fill_dt) as days_difference
   from
      sk a
   left join
      sk b
   on
      a.id = b.id
      and
      a.sk-1 = b.sk
   having
      /* first record in the id group */
      missing(prev_fill_dt) 
      or
      /* duplicate record - not sure why the 2nd record is still the "First Fill Day"? */
      intck('day',prev_fill_dt,curr_fill_dt) = 0
      or 
      /* # of days difference between the previous and current record > 120 */
      intck('day',prev_fill_dt,curr_fill_dt) > 120
   ;
quit;

P.S.: I think your post would be a bit clearer if you said "difference between current fill date and previous fill dt" instead of "difference between current fill date and last fill dt". When I first read this, I thought "last fill dt" was the last fill dt (last record) in the id group.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

sas33 · Posted 05-06-2019 08:15 AM

Thank you

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!