DATA Step, Macro, Functions and more

Remove duplicate records without missing values

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

Remove duplicate records without missing values

 I want to remove the duplicate records but the ones that have missing values as follows

Input

123412
1234 
1234 
123513
123513
123611
1237 
1237 
1237 
123710
1238 
1238 

 

Desired Output

123412
123513
123611
123710
1238 

Accepted Solutions
Solution
‎08-16-2016 11:17 PM
Super User
Posts: 10,044

Re: Remove duplicate records without missing values


data have;
infile cards truncover expandtabs;
input id x;
cards;
1234	12
1234	 
1234	 
1235	13
1235	13
1236	11
1237	 
1237	 
1237	 
1237	10
1238	 
1238	 
;
run;
proc sort data=have;
 by id descending x;
run;
data want;
 set have;
 by id;
 if first.id;
run;

View solution in original post


All Replies
Solution
‎08-16-2016 11:17 PM
Super User
Posts: 10,044

Re: Remove duplicate records without missing values


data have;
infile cards truncover expandtabs;
input id x;
cards;
1234	12
1234	 
1234	 
1235	13
1235	13
1236	11
1237	 
1237	 
1237	 
1237	10
1238	 
1238	 
;
run;
proc sort data=have;
 by id descending x;
run;
data want;
 set have;
 by id;
 if first.id;
run;

Respected Advisor
Posts: 4,932

Re: Remove duplicate records without missing values

Use the fact that missing values are inferior to all non-missing but are not included in the evaluation of MIN:

 

data test;
infile datalines missover;
input a b;
datalines;
1234	12
1234	 
1234	 
1235	13
1235	13
1236	11
1237	 
1237	 
1237	 
1237	10
1238	 
1238	 
;

proc sql;
select unique *
from test
group by a
having b >= min(b);
quit;
PG
Contributor
Posts: 71

Re: Remove duplicate records without missing values

[ Edited ]

@PGStats : Its not working for the records where 'b' in all the duplicate records is missing

Respected Advisor
Posts: 4,932

Re: Remove duplicate records without missing values

I get the result

 

                                        a         b
                                 --------------------------
                                     1234        12
                                     1235        13
                                     1236        11
                                     1237        10
                                     1238         .

case where a=1238 seems to be handled properly.

 

BTW, @Ksharp's solution would not return the case 1235 12, which is not a duplicate, if it was present in your data. 

PG
Super User
Posts: 10,044

Re: Remove duplicate records without missing values

OK. If there are other non missing value in a group , which you want keep:



data have;
infile cards truncover expandtabs;
input id x;
cards;
1234	12
1234	 
1234	 
1235	13
1235	13
1235    12
1235    
1236	11
1237	 
1237	 
1237	 
1237	10
1238	 
1238	 
;
run;
proc sort data=have;
 by id descending x;
run;
data want;
 set have;
 by id descending x ;
 if first.x and not missing(x) or first.id;
run;


☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 331 views
  • 2 likes
  • 3 in conversation