BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
zhige50
Obsidian | Level 7

I have a dataset with many duplicates for each customer ID, what I want to achieve is keeping only one record in each month for one customer. If multiple records in a month, select the record with the earliest date. How could I achieve this? I have tried first. , last. but it only gives me one cust_ID one record.

for example:

cust_id date month

12345 21APR23 15:48:30

12345 23APR23 15:50:30

12345 23APR23 17:48:30

12345 25APR23 11:48:30

12345 21Jun23 15:35:30

 

want:

12345 21APR23 15:48:30

12345 21Jun23 15:35:30

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star
data Have;
  input cust_id $ date datetime.;
  format date datetime21. month monyy7.;
  month = intnx('MONTH', datepart(date), 0, 'B');

datalines;
12345 21APR2023:15:48:30
12345 23APR2023:15:50:30
12345 23APR2023:17:48:30
12345 25APR2023:11:48:30
12345 21Jun2023:15:35:30
;
run;

proc sort data = Have;
  by cust_id month date;
run;

data Want;
  set Have;
  by  cust_id month;
  if first.month;
run;

View solution in original post

4 REPLIES 4
SASKiwi
PROC Star
data Have;
  input cust_id $ date datetime.;
  format date datetime21. month monyy7.;
  month = intnx('MONTH', datepart(date), 0, 'B');

datalines;
12345 21APR2023:15:48:30
12345 23APR2023:15:50:30
12345 23APR2023:17:48:30
12345 25APR2023:11:48:30
12345 21Jun2023:15:35:30
;
run;

proc sort data = Have;
  by cust_id month date;
run;

data Want;
  set Have;
  by  cust_id month;
  if first.month;
run;
mkeintz
PROC Star

If your dataset has a DATE variable but doesn't already have a MONTH variable, you don't need to create MONTH: 

 

data want;
  set have;
  by cust_id ;
  if first.id=1   or   intck('month',date,lag(date))^=0;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

Stealing the example data set from @SASKiwi 

 

data Have;
  input cust_id $ date datetime.;
  format date datetime21. month monyy7.;
  month = intnx('MONTH', datepart(date), 0, 'B');

datalines;
12345 21APR2023:15:48:30
12345 23APR2023:15:50:30
12345 23APR2023:17:48:30
12345 25APR2023:11:48:30
12345 21Jun2023:15:35:30
;
run;

proc summary data=have nway;
   class cust_id date;
   format date dtmonyy7. ;
   output out=want (drop=_:);
run;

proc print data=want;
   format date datetime18.;
run;

When you know an appropriate format and apply it to a Class variable, such as the Dtmonyy, then summary will return the lowest numeric value in the format created group as the value. This particular summary otherwise counts observations so we drop those count variables as not wanted for this exercise.

 

Note: Class may not be appropriate for large data sets with many combinations of the Class variables.

framon
Calcite | Level 5

Maybe give a try with a month variable and proc sort nodupkey:

 

data snacks;
	set SASHELP.snacks;
	month=compress(year(Date)||month(date));
run;

proc sort data=snacks nodupkey;by product month;run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 640 views
  • 0 likes
  • 5 in conversation