- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone! I was wondering how to delete duplicates if a specific variable="yes".
Essentially, I want the code to say IF an observation is a duplicate AND variable = "yes" then delete.
To find duplicates I've been using "if first.var=0 or last.var=0" but I can't seem to combine this with an "AND" statement as well.
Is there any way I can do this?
Thanks a lot in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The condition should be
if (first.var=0 or last.var=0) and variable = "yes" then delete;
this will get rid of all copies. If you want to keep, say, the first copy and get rid of the others, you could use
if first.var=0 and variable = "yes" then delete;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please supply example data in a data step with datalines, and show the expected result from that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the quick response!
Example data would be (lib1.have)
Company Year Variable
Apple 2010 Yes
Apple 2010 No
Microsoft 2012 Yes
Microsoft 2012 No
Then the desired output would be (lib1.want)
Company Year Variable
Apple 2010 No
Microsoft 2012 No
I've been doing something like the code below, but I'm not sure if it's ignoring the first "if" statement and simply deleting everything with variable='Yes'
data lib1.have;
set lib1.want;
by company year
if first.year=0 or last.year=0;
if variable='Yes' then delete;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using - if not (first.var and last.var); - will pick up duplicate BY variables. However what does "delete duplicate" mean? Delete all rows in a duplicate group, keep the first row in a duplicate group or the last row?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The condition should be
if (first.var=0 or last.var=0) and variable = "yes" then delete;
this will get rid of all copies. If you want to keep, say, the first copy and get rid of the others, you could use
if first.var=0 and variable = "yes" then delete;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's possible the solution could be a little complex.
What should happen if there are 3 observations, but they are all "No"? Delete all? Delete all but one? Which one (or does it matter)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In that case I'd like to delete all but one observation, how would I code for this? I'm quite new to SAS so I'm still figuring out the more complex issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's an approach that might be just what you want.
proc sort data=have;
by company year variable;
run;
data want;
set have;
by company year;
if first.year;
run;
It selects just one observation per COMPANY/YEAR, taking a "No" if one exists, and a "Yes" if there aren't any "No" in the data. There's some question in my mind whether this is the result you are looking for, so you will have to comment on that.