Hi everyone! I was wondering how to delete duplicates if a specific variable="yes".
Essentially, I want the code to say IF an observation is a duplicate AND variable = "yes" then delete.
To find duplicates I've been using "if first.var=0 or last.var=0" but I can't seem to combine this with an "AND" statement as well.
Is there any way I can do this?
Thanks a lot in advance!
The condition should be
if (first.var=0 or last.var=0) and variable = "yes" then delete;
this will get rid of all copies. If you want to keep, say, the first copy and get rid of the others, you could use
if first.var=0 and variable = "yes" then delete;
Please supply example data in a data step with datalines, and show the expected result from that.
Thank you for the quick response!
Example data would be (lib1.have)
Company Year Variable
Apple 2010 Yes
Apple 2010 No
Microsoft 2012 Yes
Microsoft 2012 No
Then the desired output would be (lib1.want)
Company Year Variable
Apple 2010 No
Microsoft 2012 No
I've been doing something like the code below, but I'm not sure if it's ignoring the first "if" statement and simply deleting everything with variable='Yes'
data lib1.have;
set lib1.want;
by company year
if first.year=0 or last.year=0;
if variable='Yes' then delete;
run;
Using - if not (first.var and last.var); - will pick up duplicate BY variables. However what does "delete duplicate" mean? Delete all rows in a duplicate group, keep the first row in a duplicate group or the last row?
The condition should be
if (first.var=0 or last.var=0) and variable = "yes" then delete;
this will get rid of all copies. If you want to keep, say, the first copy and get rid of the others, you could use
if first.var=0 and variable = "yes" then delete;
It's possible the solution could be a little complex.
What should happen if there are 3 observations, but they are all "No"? Delete all? Delete all but one? Which one (or does it matter)?
In that case I'd like to delete all but one observation, how would I code for this? I'm quite new to SAS so I'm still figuring out the more complex issues.
Here's an approach that might be just what you want.
proc sort data=have;
by company year variable;
run;
data want;
set have;
by company year;
if first.year;
run;
It selects just one observation per COMPANY/YEAR, taking a "No" if one exists, and a "Yes" if there aren't any "No" in the data. There's some question in my mind whether this is the result you are looking for, so you will have to comment on that.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.