BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lboyd
Calcite | Level 5

How can I delete duplicates of one variable based on a start date using NODUPKEY?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Sort it twice. First time with email address and date, second with email address and the NODUPKEY option.

View solution in original post

7 REPLIES 7
Reeza
Super User

 

Provide more details.


@lboyd wrote:

How can I delete duplicates of one variable based on a start date using NODUPKEY?

Thanks!


 

lboyd
Calcite | Level 5
I've been using this:
PROC SORT DATA=CMpre DUPOUT=results NODUPKEY ;
BY QID25;
RUN ;

where qid25 is an email address. I need to get rid of duplicate email addresses and I want to keep the one that had the earliest start date. Startdate variable looks something like this:
22JUN17:00:00:00
Reeza
Super User

Sort it twice. First time with email address and date, second with email address and the NODUPKEY option.

lboyd
Calcite | Level 5
This code gets rid of missing e-mails, is there a way to prevent that?
lboyd
Calcite | Level 5
Also some of the e-mails start with an uppercase letter while others start with lower case-is there any way to delete based on both? For instance, if someone said Sam123@gmail.com and also sam123@gmail.com-I'd want one of those deleted from the database.
Reeza
Super User
For instance, if someone said Sam123@gmail.com and also sam123@gmail.com-I'd want one of those deleted from the database.

 

To fix this you need to clean your data first.

 

This code gets rid of missing e-mails, is there a way to prevent that?

 

To deal with this you likely need to do it manually. 

First sort and then use a data step with first/last but coding an exception for the missing emails.

 

proc sort data=have;
by group_var;
run;

data want;
set have;
by group_var;
if first.group_var or missing(group_var);
run;



lboyd
Calcite | Level 5
Thank you!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1780 views
  • 0 likes
  • 2 in conversation