How to remove all rows if duplicate

Mchan890 · Posted 03-09-2020 04:32 PM

I have a large data set:

Have:

PO Amount Line
1212 300 1
1233 250 1
1233 600 2

1345 1520 1

1350 1000 2

1350 500 3

Want:

PO Amount Line
1212 300 1

1345 1520 1

I know the nodupkey will remove the duplicates, however, I am trying to remove both if there are duplicate.

novinosrin · Posted 03-09-2020 04:36 PM

Hi @Mchan890 Please try


data have;
input PO   Amount   Line;
cards;
1212        300     1         
1233        250     1        
1233        600     2        
1345      1520     1        
1350      1000     2
1350        500     3
;

data want;
 set have;
 by po;
 if first.po and last.po;
run;

/*or*/


proc sql;
create table want as
select *
from have
group by po
having count(*)=1;
quit;

Reeza · Posted 03-09-2020 04:37 PM

PROC SORT has a ~~UNIQUERECS~~ UNIQUEOUT option as well.

EDIT: corrected, as per @ballardw code - which is a correct answer.

novinosrin · Posted 03-09-2020 04:45 PM

Thank you @Reeza learned something new

ballardw · Posted 03-09-2020 04:49 PM

Since you want unique values of PO then:

data have;
   input PO   Amount   Line   ;
datalines;
1212        300     1         
1233        250     1        
1233        600     2        
1345      1520     1        
1350      1000     2
1350        500     3
;

Proc sort data=have out=have uniqueout=want nouniquekey;
   by po ;
run;

The data set you want is the one specified by the UNIQUEOUT option and the NOUNIQUEKEY is the appropriate equivalent to NODUPEKEY.

Please note how the data was provided as data step code so that we have something to test with.

novinosrin · Posted 03-09-2020 04:53 PM

Hmm Okay, so it's called UNIQUEOUT . I will have to remember that. Thank you!

Reeza · Posted 03-09-2020 04:55 PM

There used to be a UNIQUERECS option, it was removed because it was confusing and didn't work the way people expected and then they added UNIQUEOUT so I get them confused sometimes 🙂

novinosrin · Posted 03-09-2020 04:58 PM

Good point, I think it would help to keep abreast of which ones are active and not. I suppose there must be some docs, but really is confusing. Sir @ballardw is very good in knowing the latest stuff in the docs

Reeza · Posted 03-09-2020 05:01 PM

Unfortunately the key is checking it frequently....and I don't actually use SAS day to day anymore, which is why my participation is 'lower' and declining :(. We do everything at my current shop in R.

ChrisNZ · Posted 03-09-2020 10:15 PM

@Reeza Your lower is most people's unattainable!

High-Performance SAS Coding - Third Edition

ballardw · Posted 03-09-2020 05:50 PM

@Reeza wrote:
There used to be a UNIQUERECS option, it was removed because it was confusing and didn't work the way people expected and then they added UNIQUEOUT so I get them confused sometimes 🙂

I'm glad I wasn't the only one remembering another no longer available option. I wouldn't be surprised that code would run but without the documentation I wasn't about to try.

data_null__ · Posted 03-09-2020 05:20 PM

@ballardw wrote:

Since you want unique values of PO then:
data have;
   input PO   Amount   Line   ;
datalines;
1212        300     1         
1233        250     1        
1233        600     2        
1345      1520     1        
1350      1000     2
1350        500     3
;

Proc sort data=have out=have uniqueout=want nouniquekey;
   by po ;
run;
The data set you want is the one specified by the UNIQUEOUT option and the NOUNIQUEKEY is the appropriate equivalent to NODUPEKEY.

Please note how the data was provided as data step code so that we have something to test with.

Note: OUT= data will not have the same observations, as DATA= as might be mistakenly implied by your example: data=have out=have

ballardw · Posted 03-09-2020 05:52 PM

@data_null__ wrote:
@ballardw wrote:

Since you want unique values of PO then:
data have;
   input PO   Amount   Line   ;
datalines;
1212        300     1         
1233        250     1        
1233        600     2        
1345      1520     1        
1350      1000     2
1350        500     3
;

Proc sort data=have out=have uniqueout=want nouniquekey;
   by po ;
run;
The data set you want is the one specified by the UNIQUEOUT option and the NOUNIQUEKEY is the appropriate equivalent to NODUPEKEY.

Please note how the data was provided as data step code so that we have something to test with.
Note: OUT= data will not have the same observations, as DATA= as might be mistakenly implied by your example: data=have out=have

I thought I was getting an error without the out=have, and probably should have named it Reduced or similar.

data_null__ · Posted 03-09-2020 06:05 PM

out=_null_

If you don't need the dups.

How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Re: How to remove all rows if duplicate

Registration is open

Registration is open

SAS Training: Just a Click Away