I have a dataset with many duplicates for each customer ID, what I want to achieve is keeping only one record in each month for one customer. If multiple records in a month, select the record with the earliest date. How could I achieve this? I have tried first. , last. but it only gives me one cust_ID one record.
for example:
cust_id date month
12345 21APR23 15:48:30
12345 23APR23 15:50:30
12345 23APR23 17:48:30
12345 25APR23 11:48:30
12345 21Jun23 15:35:30
want:
12345 21APR23 15:48:30
12345 21Jun23 15:35:30
data Have;
input cust_id $ date datetime.;
format date datetime21. month monyy7.;
month = intnx('MONTH', datepart(date), 0, 'B');
datalines;
12345 21APR2023:15:48:30
12345 23APR2023:15:50:30
12345 23APR2023:17:48:30
12345 25APR2023:11:48:30
12345 21Jun2023:15:35:30
;
run;
proc sort data = Have;
by cust_id month date;
run;
data Want;
set Have;
by cust_id month;
if first.month;
run;
data Have;
input cust_id $ date datetime.;
format date datetime21. month monyy7.;
month = intnx('MONTH', datepart(date), 0, 'B');
datalines;
12345 21APR2023:15:48:30
12345 23APR2023:15:50:30
12345 23APR2023:17:48:30
12345 25APR2023:11:48:30
12345 21Jun2023:15:35:30
;
run;
proc sort data = Have;
by cust_id month date;
run;
data Want;
set Have;
by cust_id month;
if first.month;
run;
If your dataset has a DATE variable but doesn't already have a MONTH variable, you don't need to create MONTH:
data want;
set have;
by cust_id ;
if first.id=1 or intck('month',date,lag(date))^=0;
run;
Stealing the example data set from @SASKiwi
data Have; input cust_id $ date datetime.; format date datetime21. month monyy7.; month = intnx('MONTH', datepart(date), 0, 'B'); datalines; 12345 21APR2023:15:48:30 12345 23APR2023:15:50:30 12345 23APR2023:17:48:30 12345 25APR2023:11:48:30 12345 21Jun2023:15:35:30 ; run; proc summary data=have nway; class cust_id date; format date dtmonyy7. ; output out=want (drop=_:); run; proc print data=want; format date datetime18.; run;
When you know an appropriate format and apply it to a Class variable, such as the Dtmonyy, then summary will return the lowest numeric value in the format created group as the value. This particular summary otherwise counts observations so we drop those count variables as not wanted for this exercise.
Note: Class may not be appropriate for large data sets with many combinations of the Class variables.
Maybe give a try with a month variable and proc sort nodupkey:
data snacks; set SASHELP.snacks; month=compress(year(Date)||month(date)); run; proc sort data=snacks nodupkey;by product month;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.