BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lboyd
Calcite | Level 5

How can I delete duplicates of one variable based on a start date using NODUPKEY?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Sort it twice. First time with email address and date, second with email address and the NODUPKEY option.

View solution in original post

7 REPLIES 7
Reeza
Super User

 

Provide more details.


@lboyd wrote:

How can I delete duplicates of one variable based on a start date using NODUPKEY?

Thanks!


 

lboyd
Calcite | Level 5
I've been using this:
PROC SORT DATA=CMpre DUPOUT=results NODUPKEY ;
BY QID25;
RUN ;

where qid25 is an email address. I need to get rid of duplicate email addresses and I want to keep the one that had the earliest start date. Startdate variable looks something like this:
22JUN17:00:00:00
Reeza
Super User

Sort it twice. First time with email address and date, second with email address and the NODUPKEY option.

lboyd
Calcite | Level 5
This code gets rid of missing e-mails, is there a way to prevent that?
lboyd
Calcite | Level 5
Also some of the e-mails start with an uppercase letter while others start with lower case-is there any way to delete based on both? For instance, if someone said Sam123@gmail.com and also sam123@gmail.com-I'd want one of those deleted from the database.
Reeza
Super User
For instance, if someone said Sam123@gmail.com and also sam123@gmail.com-I'd want one of those deleted from the database.

 

To fix this you need to clean your data first.

 

This code gets rid of missing e-mails, is there a way to prevent that?

 

To deal with this you likely need to do it manually. 

First sort and then use a data step with first/last but coding an exception for the missing emails.

 

proc sort data=have;
by group_var;
run;

data want;
set have;
by group_var;
if first.group_var or missing(group_var);
run;



lboyd
Calcite | Level 5
Thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2350 views
  • 0 likes
  • 2 in conversation