Hello,
I want to subset the Input Table by comparing the Service Date of each Customer_ID.
If difference in Service Date of a Customer_ID is within 7 days, then select only the observation with oldest date.
If difference in Service Date is more than 7 days then select all the observations. Please see the Input and Output Tables below and help me with the Code.
Input Table | ||
Customer_ID | Transaction_ID | Service Date |
AB1 | 12345 | 10/12/18 |
AB1 | 12346 | 10/14/18 |
AB1 | 12347 | 10/15/18 |
AB2 | 22233 | 6/10/18 |
AB2 | 22234 | 9/12/18 |
AB3 | 35678 | 1/5/19 |
Output Table | ||
Customer_ID | Transaction_ID | Service Date |
AB1 | 12345 | 10/12/18 |
AB2 | 22233 | 6/10/18 |
AB2 | 22234 | 9/12/18 |
AB3 | 35678 | 1/5/19 |
Thanks,
Varun
You want to drop a transaction that is within 7 days of the prior transaction (presumably you want this date interval honored only WITHIN id's, not BETWEEN id's - that's @novinosrin's question):
data have;
input Customer_ID :$3. Transaction_ID Service_Date :mmddyy8.;
format service_date date9.;
datalines;;
AB1 12345 10/12/18
AB1 12346 10/14/18
AB1 12347 10/15/18
AB2 22233 6/10/18
AB2 22234 9/12/18
AB3 35678 1/5/19
run;
data want (drop=last_output_date);
set have ;
by customer_id;
if first.customer_id
or service_date-7 > last_output_date;
output;
last_output_date=service_date;
retain last_output_date;
run;
This program creates a variable LAST_OUTPUT_DATE, preserving the most recent date that passes your criterion. It is RETAINed across observations (not automatically reset to missing at top of data step), making it possible for you to compare the observation in hand to some previously generated value. This is necessary because you can't just compare one date to the immediately prior date - instead you may have to compare it to a date further back than the prior date. That's the situation for id AB1.
The IF statement above has no THEN clause. Therefore it is a subsetting if. Only observations meeting the if condition are processed further.
I would like you to explain
What if dif<=7 in some transactions and dif>7 in other transactions for the same customer_id?
Customer_ID | Transaction_ID | ServiceDate | dif |
---|---|---|---|
AB1 | 12345 | 10/12/2018 | . |
AB1 | 12346 | 10/14/2018 | 2 |
AB1 | 12347 | 10/15/2018 | 1 |
AB2 | 22233 | 06/10/2018 | . |
AB2 | 22234 | 09/12/2018 | 94 |
AB3 | 35678 | 01/05/2019 | . |
That's is a great question. I did not think of that before.
Thanks for asking.
If diff is greater than 7 than I would like the Service Date to be compared with the last selected observation and perform the same logic.
Here, I have modified the Input and Output example. Hope this will make it clear and it's the same thing you were asking :
Input Table | |||
Customer_ID | Transaction_ID | Service Date | Diff |
AB1 | 12345 | 10/12/18 | - |
AB1 | 12346 | 10/14/18 | 2 |
AB1 | 12347 | 10/15/18 | 1 |
AB2 | 22233 | 6/10/18 | - |
AB2 | 22234 | 9/12/18 | 94 |
AB2 | 22356 | 9/15/18 | 3 |
AB3 | 35678 | 6/6/18 | - |
AB3 | 35679 | 6/13/18 | 7 |
AB3 | 35680 | 6/15/18 | 9 |
AB3 | 35681 | 6/18/18 | 3 |
Output Table | ||
Customer_ID | Transaction_ID | Service Date |
AB1 | 12345 | 10/12/18 |
AB2 | 22233 | 6/10/18 |
AB2 | 22234 | 9/12/18 |
AB3 | 35678 | 6/6/18 |
AB3 | 35680 | 6/15/18 |
Thanks for your quick response,
Varun
Hello,
Please provide your sample data in the form of a datastep, this will help more number of people responding with spending their little time.
Check this:
data have;
infile datalines dlm='09'x dsd;
input Customer_ID $ Transaction_ID Service_Date mmddyy8.;
format Service_Date mmddyy10.;
datalines;
AB1 12345 10/12/18
AB1 12346 10/14/18
AB1 12347 10/15/18
AB2 22233 6/10/18
AB2 22234 9/12/18
AB3 35678 1/5/19
;
run;
proc sort data=have;
by Customer_ID Transaction_ID descending Service_Date;
run;
data have_dif;
set have;
by Customer_ID Transaction_ID descending Service_Date;
dif_dt=dif(Service_Date);
if first.Customer_ID then dif_dt=0;
run;
proc sql;
select Customer_ID ,Transaction_ID,Service_Date
from have_dif
where customer_id in (select customer_id from have_dif where dif_dt>7)
union
select Customer_ID ,Transaction_ID,Service_Date
from have_dif
where customer_id not in (select customer_id from have_dif where dif_dt>7)
group by Customer_ID
having Service_Date=min(Service_Date)
order by customer_id
;
quit;
You want to drop a transaction that is within 7 days of the prior transaction (presumably you want this date interval honored only WITHIN id's, not BETWEEN id's - that's @novinosrin's question):
data have;
input Customer_ID :$3. Transaction_ID Service_Date :mmddyy8.;
format service_date date9.;
datalines;;
AB1 12345 10/12/18
AB1 12346 10/14/18
AB1 12347 10/15/18
AB2 22233 6/10/18
AB2 22234 9/12/18
AB3 35678 1/5/19
run;
data want (drop=last_output_date);
set have ;
by customer_id;
if first.customer_id
or service_date-7 > last_output_date;
output;
last_output_date=service_date;
retain last_output_date;
run;
This program creates a variable LAST_OUTPUT_DATE, preserving the most recent date that passes your criterion. It is RETAINed across observations (not automatically reset to missing at top of data step), making it possible for you to compare the observation in hand to some previously generated value. This is necessary because you can't just compare one date to the immediately prior date - instead you may have to compare it to a date further back than the prior date. That's the situation for id AB1.
The IF statement above has no THEN clause. Therefore it is a subsetting if. Only observations meeting the if condition are processed further.
"help me with the code" can mean a lot of things, anything from:
If I were writing the code, I would begin with a simple step. Assuming the data set is in sorted order by Customer_ID, and assuming that Service_Date is an actual SAS date and not a character string:
proc summary data=have;
by customer_ID;
var Service_Date;
output out=ranges (keep=customer_ID First_Date Last_Date) min=first_date max=last_date;
run;
This gives you a SAS data set with one observation per Customer_ID, with the first and last date of service for that customer. If you had such a SAS data set, would you have any idea of what you need to do next? Are you able to outline a plan (not necessarily write the program)?
@VarunD wrote:
...
Service_Date is not actual SAS Date.
...
But Service_Date becomes a SAS datea via the INPUT statement below, which tells sas to interpret that field with the MMDDYY8. inbound format. This means that sas converts the input field to a sas datea value (a value recording the number of days after or before 1/1/1960). You can make these numbers easy to visually interpret by assign a FORMAT to the variable, so that you can realize what the date is for that date value.
input Customer_ID :$3. Transaction_ID Service_Date :mmddyy8.;
format service_date date9.;
Once this date value is constructed, then a subtraction between two values is the number of days by which they are separated.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.