BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
VarunD
Obsidian | Level 7

Hello,

 

I want to subset the Input Table by comparing the Service Date of each Customer_ID.

If difference in Service Date of a Customer_ID is within 7 days, then select only the observation with oldest date.

If difference in Service Date is more than 7 days then select all the observations. Please see the Input and Output Tables below and help me with the Code.

 

                                 Input Table
Customer_IDTransaction_IDService Date
AB11234510/12/18
AB11234610/14/18
AB11234710/15/18
AB2222336/10/18
AB2222349/12/18
AB3356781/5/19

 

                              Output Table
Customer_IDTransaction_IDService Date
AB11234510/12/18
AB2222336/10/18
AB2222349/12/18
AB3356781/5/19

 

Thanks,
Varun

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You want to drop a transaction that is within 7 days of the prior transaction (presumably you want this date interval honored only WITHIN id's, not BETWEEN id's - that's @novinosrin's question):

 

data have;
  input Customer_ID :$3. Transaction_ID Service_Date  :mmddyy8.;
  format service_date date9.;
datalines;;
AB1 12345 10/12/18 
AB1 12346 10/14/18 
AB1 12347 10/15/18 
AB2 22233 6/10/18 
AB2 22234 9/12/18 
AB3 35678 1/5/19 
run;
data want (drop=last_output_date);
  set have ;
  by customer_id;
  if first.customer_id
    or service_date-7 > last_output_date;
  output;
  last_output_date=service_date;
  retain last_output_date;
run;

This program creates a variable LAST_OUTPUT_DATE, preserving the most recent date that passes your criterion.  It is RETAINed across observations (not automatically reset to missing at top of data step), making it possible for you to compare the observation in hand to some previously generated value.  This is necessary because you can't just compare one date to the immediately prior date - instead you may have to compare it to a date further back than the prior date.  That's the situation for id AB1.

 

The IF statement above has no THEN clause.  Therefore it is a subsetting if.  Only observations meeting the if condition are processed further.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

I would like you to explain 

 

What if dif<=7 in some transactions and dif>7  in other transactions for the same customer_id?

 

Customer_ID Transaction_ID ServiceDate dif
AB1 12345 10/12/2018 .
AB1 12346 10/14/2018 2
AB1 12347 10/15/2018 1
AB2 22233 06/10/2018 .
AB2 22234 09/12/2018 94
AB3 35678 01/05/2019 .
VarunD
Obsidian | Level 7

That's is a great question. I did not think of that before.

Thanks for asking.

 

If diff is greater than 7 than I would like the Service Date to be compared with the last selected observation and perform the same logic.

Here, I have modified the Input and Output example. Hope this will make it clear and it's the same thing you were asking :

Input Table   
Customer_IDTransaction_IDService DateDiff
AB11234510/12/18-
AB11234610/14/182
AB11234710/15/181
AB2222336/10/18-
AB2222349/12/1894
AB2223569/15/183
AB3356786/6/18-
AB3356796/13/187
AB3356806/15/189
AB3356816/18/183

 

Output Table  
Customer_IDTransaction_IDService Date
AB11234510/12/18
AB2222336/10/18
AB2222349/12/18
AB3356786/6/18
AB3356806/15/18

 

 

Thanks for your quick response,
Varun

 

SuryaKiran
Meteorite | Level 14

Hello,

 

Please provide your sample data in the form of a datastep, this will help more number of people responding with spending their little time.

 

Check this:

data have;
infile datalines dlm='09'x dsd;
input Customer_ID $	Transaction_ID	Service_Date mmddyy8.;
format Service_Date mmddyy10.;
datalines;
AB1	12345	10/12/18
AB1	12346	10/14/18
AB1	12347	10/15/18
AB2	22233	6/10/18
AB2	22234	9/12/18
AB3	35678	1/5/19
;
run;

proc sort data=have;
by Customer_ID 	Transaction_ID	descending Service_Date;
run;

data have_dif;
set have;
by Customer_ID 	Transaction_ID	descending Service_Date;
dif_dt=dif(Service_Date);
if first.Customer_ID then dif_dt=0;
run;

proc sql;
select Customer_ID ,Transaction_ID,Service_Date
	from have_dif
	where customer_id in (select customer_id from have_dif where dif_dt>7)
union 
select Customer_ID ,Transaction_ID,Service_Date
	from have_dif
	where customer_id not in (select customer_id from have_dif where dif_dt>7)
group by Customer_ID
	having Service_Date=min(Service_Date)
order by customer_id
;
quit;

 

Thanks,
Suryakiran
VarunD
Obsidian | Level 7
Thanks for your response! I will ensure that I provide data in datastep for next time.
mkeintz
PROC Star

You want to drop a transaction that is within 7 days of the prior transaction (presumably you want this date interval honored only WITHIN id's, not BETWEEN id's - that's @novinosrin's question):

 

data have;
  input Customer_ID :$3. Transaction_ID Service_Date  :mmddyy8.;
  format service_date date9.;
datalines;;
AB1 12345 10/12/18 
AB1 12346 10/14/18 
AB1 12347 10/15/18 
AB2 22233 6/10/18 
AB2 22234 9/12/18 
AB3 35678 1/5/19 
run;
data want (drop=last_output_date);
  set have ;
  by customer_id;
  if first.customer_id
    or service_date-7 > last_output_date;
  output;
  last_output_date=service_date;
  retain last_output_date;
run;

This program creates a variable LAST_OUTPUT_DATE, preserving the most recent date that passes your criterion.  It is RETAINed across observations (not automatically reset to missing at top of data step), making it possible for you to compare the observation in hand to some previously generated value.  This is necessary because you can't just compare one date to the immediately prior date - instead you may have to compare it to a date further back than the prior date.  That's the situation for id AB1.

 

The IF statement above has no THEN clause.  Therefore it is a subsetting if.  Only observations meeting the if condition are processed further.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
VarunD
Obsidian | Level 7
This seems to have worked. I will have to verify the output that might take some time. But this looks good. Thanks a lot !
Astounding
PROC Star

"help me with the code" can mean a lot of things, anything from:

 

  • I got an error that I don't know how to fix, to
  • I have no idea what to do, so write it for me

If I were writing the code, I would begin with a simple step.  Assuming the data set is in sorted order by Customer_ID, and assuming that Service_Date is an actual SAS date and not a character string:

 

proc summary data=have;

by customer_ID;

var Service_Date;

output out=ranges (keep=customer_ID First_Date Last_Date) min=first_date max=last_date;

run;

 

This gives you a SAS data set with one observation per Customer_ID, with the first and last date of service for that customer.  If you had such a SAS data set, would you have any idea of what you need to do next?  Are you able to outline a plan (not necessarily write the program)?

VarunD
Obsidian | Level 7
Thanks for you reply. I meant - I have no idea.
Service_Date is not actual SAS Date.
I am hoping to get more than one observation per Customer_ID.
novinosrin
Tourmaline | Level 20

Hi @VarunD  I think @mkeintz  comprehended your need pretty well. Play with his solution and see if that works. If I were to take a stab, might partially plagiarize on his idea as that datastep gives a beautiful start. If that solution requires tweaking and Mark happens to be busy, I will chime in. 

mkeintz
PROC Star

@VarunD wrote:
...
Service_Date is not actual SAS Date.
...

 

But Service_Date becomes a SAS datea via the INPUT statement below, which tells sas to interpret that field with the MMDDYY8. inbound format.  This means that sas converts the input field to a sas datea value (a value recording the number of days after or before 1/1/1960).  You can make these numbers easy to visually interpret by assign a FORMAT to the variable, so that you can realize what the date is for that date value.

 

  input Customer_ID :$3. Transaction_ID Service_Date  :mmddyy8.;
  format service_date date9.;

 

Once this date value is constructed, then a subtraction between two values is the number of days by which they are separated.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1706 views
  • 2 likes
  • 5 in conversation