Solved: Re: Remove entry that is duplicated in one variable but not others

sasprogramming · Posted 12-03-2019 06:48 PM

Below is a snippet from very large data set. The problem I am experiencing is to do with duplicates. As you can see below I have duplicated values in the 'ID' column, however the rest of the values in the other coloumns are not duplicates. What I want to do is:

Remove the row with the 'first' duplicate, where there a zero entry in Var1 and blanks in Var2 and Var 3.
Therefore keeping the row where there is information for all variables.

How can I achieve this is SAS?

Thanks

ballardw · Posted 12-03-2019 07:00 PM

One way:

proc sort data=have;
   by id Var1;
run;

data want;
   set have;
   by id;
   if first.id and var1=0 and missing(var2) and missing(var3) then delete;
run;

View solution in original post

ballardw · Posted 12-03-2019 07:00 PM

One way:

proc sort data=have;
   by id Var1;
run;

data want;
   set have;
   by id;
   if first.id and var1=0 and missing(var2) and missing(var3) then delete;
run;

sasprogramming · Posted 12-03-2019 09:32 PM

That worked, thank you!

Remove entry that is duplicated in one variable but not others

Re: Remove entry that is duplicated in one variable but not others

Re: Remove entry that is duplicated in one variable but not others

Re: Remove entry that is duplicated in one variable but not others

SAS Innovate 2025: Save the Date