BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lb16fa
Fluorite | Level 6

Hi everyone! I was wondering how to delete duplicates if a specific variable="yes".

Essentially, I want the code to say IF an observation is a duplicate AND variable = "yes" then delete. 

To find duplicates I've been using "if first.var=0 or last.var=0" but I can't seem to combine this with an "AND" statement as well.

Is there any way I can do this?

Thanks a lot in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

The condition should be

 

if (first.var=0 or last.var=0) and variable = "yes" then delete;

 

this will get rid of all copies. If you want to keep, say, the first copy and get rid of the others, you could use

 

if first.var=0 and variable = "yes" then delete;

PG

View solution in original post

8 REPLIES 8
lb16fa
Fluorite | Level 6

Thank you for the quick response!

Example data would be (lib1.have)

Company      Year         Variable           

Apple             2010        Yes

Apple             2010        No

Microsoft        2012        Yes

Microsoft        2012        No

 

Then the desired output would be (lib1.want)

Company        Year       Variable

Apple             2010        No

Microsoft        2012        No

I've been doing something like the code below, but I'm not sure if it's ignoring the first "if" statement and simply deleting everything with variable='Yes'

data lib1.have;
set lib1.want;
by company year
if first.year=0 or last.year=0;
if variable='Yes' then delete;
run;

 

SASKiwi
PROC Star

Using - if not (first.var and last.var); - will pick up duplicate BY variables. However what does "delete duplicate" mean? Delete all rows in a duplicate group, keep the first row in a duplicate group or the last row?

PGStats
Opal | Level 21

The condition should be

 

if (first.var=0 or last.var=0) and variable = "yes" then delete;

 

this will get rid of all copies. If you want to keep, say, the first copy and get rid of the others, you could use

 

if first.var=0 and variable = "yes" then delete;

PG
Astounding
PROC Star

It's possible the solution could be a little complex.

 

What should happen if there are 3 observations, but they are all "No"?  Delete all?  Delete all but one?   Which one (or does it matter)?

lb16fa
Fluorite | Level 6

In that case I'd like to delete all but one observation, how would I code for this? I'm quite new to SAS so I'm still figuring out the more complex issues.

SASKiwi
PROC Star

@lb16fa  - That question has already been answered by @PGStats in his response.

Astounding
PROC Star

Here's an approach that might be just what you want.  

proc sort data=have;
   by company year variable;
run;
data want;
   set have;
   by company year;
   if first.year;
run;

It selects just one observation per COMPANY/YEAR, taking a "No" if one exists, and a "Yes" if there aren't any "No" in the data.  There's some question in my mind whether this is the result you are looking for, so you will have to comment on that.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3532 views
  • 0 likes
  • 5 in conversation