Hello all. I have the below table and would like to keep the record with delivery date that is nearest to the order date for every combination of customer and order date. In this case, I would like to keep record 1, 3 and 5. I'm not sure how to write the code.
Thank you in advance!
customer_id order_date delivery_date
1 01JAN2021 01JAN2021
1 01JAN2021 10JAN2021
1 02JAN2021 10JAN2021
1 02JAN2021 11JAN2021
2 02JAN2021 10JAN2021
2 02JAN2021 20JAN2021
This can be done using FIRST. and LAST. Data step variables
/* Sample data */
data got ;
infile cards ;
input
customer_id $
order_date : date9.
delivery : date9. ;
format
order_date date9.
delivery date9.
;
cards ;
1 01JAN2021 01JAN2021
1 01JAN2021 10JAN2021
1 02JAN2021 10JAN2021
1 02JAN2021 11JAN2021
2 02JAN2021 10JAN2021
2 02JAN2021 20JAN2021
;
/* Sort the data */
proc sort
data=got
out=srtdGot ;
by
customer_id order_date delivery ;
run ;
data want ;
set srtdGot ;
by
customer_id order_date delivery ;
/* If it's the first occurance of order_date then you are going to want this */
if first.order_date then
output want ;
run ;
This "mean" example doesn't work:
data have;
input customer_id (order_date delivery) (:date9.);
format order_date delivery date9.;
OBS = _N_;
cards;
1 01JAN2021 01JAN2021
1 01JAN2021 10JAN2021
1 01JAN2021 .
1 02JAN2021 10JAN2021
1 02JAN2021 11JAN2021
2 02JAN2021 10JAN2021
2 02JAN2021 20JAN2021
;
run;
Bart
Hi,
this may help:
data have;
input customer_id (order_date delivery_date) (:date9.);
format order_date delivery_date date9.;
OBS = _N_;
cards;
1 01JAN2021 01JAN2021
1 01JAN2021 10JAN2021
1 02JAN2021 10JAN2021
1 02JAN2021 11JAN2021
2 02JAN2021 10JAN2021
2 02JAN2021 20JAN2021
;
run;
proc print;
run;
data want;
_min_ = constant("big");
do _N_ = 1 by 1 until(last.order_date);
set have;
by customer_id order_date;
_x_ = abs(order_date-delivery_date);
if _min_ > _x_ > . then
do;
_min_ = _x_;
_O_ = _N_;
end;
end;
do _N_ = 1 to _N_;
set have;
if _N_ = _O_ then output;
end;
drop _:;
run;
proc print;
run;
Bart
In the following code observations with closed date are ranked based on the closeness.
The closest date will have rank 1, the next will have 2,
You can filter based on how many records you want..
I have named the ranking variable as time_rank. You can call anything else.
For each customer there will be a different rank.
Please let me know if you have questions.
Here is the code:
/*---------------------------------------------*/
data test;
Retain customer_id order_date delivery_date;
format order_date delivery_date date9.;
informat order_date delivery_date date9.;
input customer_id order_date delivery_date;
date_diff=delivery_Date -order_date;
datalines;
1 01JAN2021 01JAN2021
1 01JAN2021 10JAN2021
1 02JAN2021 10JAN2021
1 02JAN2021 11JAN2021
2 02JAN2021 10JAN2021
2 02JAN2021 20JAN2021
;
;
run;
proc sort data=test;
by customer_id order_date delivery_date;;
run;
proc rank data=test out=test2 (drop=date_diff) ties=low;
by customer_id;
var date_diff;
ranks time_rank;
run;
proc sort data=test2;
by customer_id time_rank;
run;
The output will be like this:
customer_id order_date delivery_date time_rank
1 01JAN2021 01JAN2021 1
1 02JAN2021 10JAN2021 2
1 01JAN2021 10JAN2021 3
1 02JAN2021 11JAN2021 3
2 02JAN2021 10JAN2021 1
2 02JAN2021 20JAN2021 2
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.