BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas33
Calcite | Level 5

Hi

i have following data set and i would like to create output as shown below. i need to output the records if difference between current fill date and last fill dt is > 120 with in id.

data have;
infile datalines dsd truncover;
input id $10. fill_dt:mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

 

Output:

 

1000 08/21/2018
2000 02/02/2018
2000 11/07/2018
3000 03/05/2018
3000 03/05/2018
3000 10/24/2018

1 ACCEPTED SOLUTION

Accepted Solutions
AMSAS
SAS Super FREQ

This can be achieved with BY GROUP processing and the lag function

Here's an example, although you'll need to adjust it slightly to account for the two 3000 Id records on 03/05/2018 as the example only outputs a single observation and you indicated you wanted 2

 

data have;
infile datalines ;
input id $ fill_dt mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

proc sort data=have out=srtd ;
	by id fill_dt ;
run ;

data want ;
	set srtd ;
	by id fill_dt ;
	lg=lag1(fill_dt) ;
	lg120=lg+120 ;	
	put id " - " fill_dt date7. " - " lg date7. " - " lg120 date7. ;
	if not(first.id) then do ;
		if lg120<fill_dt then
			output ;
	end ;
	else 
		output ;
run ;

View solution in original post

9 REPLIES 9
novinosrin
Tourmaline | Level 20

HI @sas33   Can you please explain this "i need to output the records if difference between current fill date and last fill dt is > 120 with in id." with the data? I am not able to comprehend clearly.

sas33
Calcite | Level 5

here is the explanation how i wanted. if there are more fills after on a particular after 120 days since last fill day, i need to show all those records in output.

 

 

idfill_dtDays_since_last_fillQualifies for outputComment
100008/21/2018 YFirst Fill Day
200002/02/2018 YFirst Fill Day
200002/17/201815NDays Difference between Last Fill date and current fill date is <120
200011/07/2018263Y> 120 days from previous fill 
200011/12/20185NDays Difference between Last Fill date and current fill date is lt 120
200011/28/201816NDays Difference between Last Fill date and current fill date is < 120
300003/05/2018 YFirst Fill Day
300003/05/2018 YFirst Fill Day
300005/29/201885NDays Difference between Last Fill date and current fill date is < 120
300010/24/2018148Y> 120 days from previous fill 
300011/24/201831N

Days Difference between Last Fill date and current fill date is < 120

 

novinosrin
Tourmaline | Level 20

Hi @sas33  Thanks!

 

data have;
infile datalines  truncover;
input id : $10. fill_dt :mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

data want;
set have;
by id;
retain d;
k=dif(fill_dt);
if first.id then do;
call missing(k);
d=fill_dt;
end;
if fill_dt=d or k>120 then output;
drop d k;
run;

 

 

sas33
Calcite | Level 5

Thanks you novinosrin! This is very close.

 

If i have additional record in my input as below for id 3000, i am seeing only one record. i want to see both of them.

 

3000 10/24/2018

novinosrin
Tourmaline | Level 20

Hi @sas33  Try

 

data have;
infile datalines  truncover;
input id : $10. fill_dt :mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 10/24/2018
3000 11/24/2018
;
run;

data want;
set have;
by id fill_dt;
retain d f;
k=dif(fill_dt);
if first.id then do;
call missing(k);
d=fill_dt;
end;
if first.fill_dt then f=.;
if fill_dt=d or k>120 then f=1;
if f;
drop d k f;
run;
AMSAS
SAS Super FREQ

This can be achieved with BY GROUP processing and the lag function

Here's an example, although you'll need to adjust it slightly to account for the two 3000 Id records on 03/05/2018 as the example only outputs a single observation and you indicated you wanted 2

 

data have;
infile datalines ;
input id $ fill_dt mmddyy10.;
format fill_dt MMDDYY10.;
datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

proc sort data=have out=srtd ;
	by id fill_dt ;
run ;

data want ;
	set srtd ;
	by id fill_dt ;
	lg=lag1(fill_dt) ;
	lg120=lg+120 ;	
	put id " - " fill_dt date7. " - " lg date7. " - " lg120 date7. ;
	if not(first.id) then do ;
		if lg120<fill_dt then
			output ;
	end ;
	else 
		output ;
run ;
ballardw
Super User

And another similar:

/* assumes HAVE is sorted by ID and fill_dt*/
data want;
   set have;
   by id fill_dt;
   retain prevdt;
   if first.id  then do;
     output;
     prevdt = fill_dt;
   end;
   else if fill_dt- prevdt>120 then do;
      output;
     prevdt = fill_dt;
   end;
   else prevdt=fill_dt;
   drop prevdt;
run;

But need clarification about the duplicate id fill_dt.

ScottBass
Rhodochrosite | Level 12

Just as in life, SAS is good at "knowing the past" (lag function for previous record(s)), but not as good as "seeing the future" (reading ahead to the next row and doing something with the current row).

 

For your issue, the lag function (or retain of a temporary variable to which you assign the fill_dt) will work fine.

 

With that in mind, below is a design pattern I often use, especially when I need to "look ahead" to the next row(s) and do something with the current row.  It takes advantage of a surrogate key (record number) + SQL's Cartesian product to either "look behind" or "look ahead" to previous record(s) or next record(s).  This approach assumes the source data is already sorted in keys (id) and chronological (fill_dt) order.

 

As @ballardw says, you don't make clear why the duplicate (2nd) row for id=3000 constitutes a "first fill"?  How can you have two "firsts" in your id group?

 

Anyway, with the "have" and "want" you provided, this gives your desired result. 

 

Whether you choose to use this approach, IMO this is a handy design pattern to have in your kit bag, and I encourage you to understand how this works.

 

HTH...

 

data have;
   input id $4. fill_dt:mmddyy10.;
   format fill_dt MMDDYY10.;
   datalines;
1000 08/21/2018
2000 02/02/2018
2000 02/17/2018
2000 11/07/2018
2000 11/12/2018
2000 11/28/2018
3000 03/05/2018
3000 03/05/2018
3000 05/29/2018
3000 10/24/2018
3000 11/24/2018
;
run;

* create surrogate key ;
data sk / view=sk;
   sk+1;
   set have;
run;

* use SQL + Cartesian product to look behind (or ahead) ;
proc sql;
   create table test as
   select 
      a.id
     ,b.fill_dt as prev_fill_dt
     ,a.fill_dt as curr_fill_dt
   from
      sk a
   left join
      sk b
   on
      a.id = b.id
      and
      a.sk = b.sk+1
   ;
quit;

* final approach ;
proc sql;
   create table want as
   select 
      a.id
     ,b.fill_dt as prev_fill_dt
     ,a.fill_dt as curr_fill_dt
     ,intck('day',b.fill_dt,curr_fill_dt) as days_difference
   from
      sk a
   left join
      sk b
   on
      a.id = b.id
      and
      a.sk-1 = b.sk
   having
      /* first record in the id group */
      missing(prev_fill_dt) 
      or
      /* duplicate record - not sure why the 2nd record is still the "First Fill Day"? */
      intck('day',prev_fill_dt,curr_fill_dt) = 0
      or 
      /* # of days difference between the previous and current record > 120 */
      intck('day',prev_fill_dt,curr_fill_dt) > 120
   ;
quit;

P.S.:  I think your post would be a bit clearer if you said "difference between current fill date and previous fill dt" instead of "difference between current fill date and last fill dt".  When I first read this, I thought "last fill dt" was the last fill dt (last record) in the id group.

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
sas33
Calcite | Level 5

Thank you

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1047 views
  • 0 likes
  • 5 in conversation