BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lboyd
Calcite | Level 5

How can I delete duplicates of one variable based on a start date using NODUPKEY?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Sort it twice. First time with email address and date, second with email address and the NODUPKEY option.

View solution in original post

7 REPLIES 7
Reeza
Super User

 

Provide more details.


@lboyd wrote:

How can I delete duplicates of one variable based on a start date using NODUPKEY?

Thanks!


 

lboyd
Calcite | Level 5
I've been using this:
PROC SORT DATA=CMpre DUPOUT=results NODUPKEY ;
BY QID25;
RUN ;

where qid25 is an email address. I need to get rid of duplicate email addresses and I want to keep the one that had the earliest start date. Startdate variable looks something like this:
22JUN17:00:00:00
Reeza
Super User

Sort it twice. First time with email address and date, second with email address and the NODUPKEY option.

lboyd
Calcite | Level 5
This code gets rid of missing e-mails, is there a way to prevent that?
lboyd
Calcite | Level 5
Also some of the e-mails start with an uppercase letter while others start with lower case-is there any way to delete based on both? For instance, if someone said Sam123@gmail.com and also sam123@gmail.com-I'd want one of those deleted from the database.
Reeza
Super User
For instance, if someone said Sam123@gmail.com and also sam123@gmail.com-I'd want one of those deleted from the database.

 

To fix this you need to clean your data first.

 

This code gets rid of missing e-mails, is there a way to prevent that?

 

To deal with this you likely need to do it manually. 

First sort and then use a data step with first/last but coding an exception for the missing emails.

 

proc sort data=have;
by group_var;
run;

data want;
set have;
by group_var;
if first.group_var or missing(group_var);
run;



lboyd
Calcite | Level 5
Thank you!

sas-innovate-2024.png

 

Secure your spot at the must-attend AI and analytics event of 2024: SAS Innovate 2024! Get ready for a jam-packed agenda featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.

 

Register by March 1 to snag the Early Bird rate of just $695! Don't miss out on this exclusive offer. 

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1165 views
  • 0 likes
  • 2 in conversation