I have a large data set:
Have:
PO Amount Line
1212 300 1
1233 250 1
1233 600 2
1345 1520 1
1350 1000 2
1350 500 3
Want:
PO Amount Line
1212 300 1
1345 1520 1
I know the nodupkey will remove the duplicates, however, I am trying to remove both if there are duplicate.
Hi @Mchan890 Please try
data have;
input PO Amount Line;
cards;
1212 300 1
1233 250 1
1233 600 2
1345 1520 1
1350 1000 2
1350 500 3
;
data want;
set have;
by po;
if first.po and last.po;
run;
/*or*/
proc sql;
create table want as
select *
from have
group by po
having count(*)=1;
quit;
PROC SORT has a UNIQUERECS UNIQUEOUT option as well.
EDIT: corrected, as per @ballardw code - which is a correct answer.
Thank you @Reeza learned something new
Since you want unique values of PO then:
data have; input PO Amount Line ; datalines; 1212 300 1 1233 250 1 1233 600 2 1345 1520 1 1350 1000 2 1350 500 3 ; Proc sort data=have out=have uniqueout=want nouniquekey; by po ; run;
The data set you want is the one specified by the UNIQUEOUT option and the NOUNIQUEKEY is the appropriate equivalent to NODUPEKEY.
Please note how the data was provided as data step code so that we have something to test with.
Hmm Okay, so it's called UNIQUEOUT . I will have to remember that. Thank you!
Good point, I think it would help to keep abreast of which ones are active and not. I suppose there must be some docs, but really is confusing. Sir @ballardw is very good in knowing the latest stuff in the docs
@Reeza Your lower is most people's unattainable!
@Reeza wrote:
There used to be a UNIQUERECS option, it was removed because it was confusing and didn't work the way people expected and then they added UNIQUEOUT so I get them confused sometimes 🙂
I'm glad I wasn't the only one remembering another no longer available option. I wouldn't be surprised that code would run but without the documentation I wasn't about to try.
@ballardw wrote:
Since you want unique values of PO then:
data have; input PO Amount Line ; datalines; 1212 300 1 1233 250 1 1233 600 2 1345 1520 1 1350 1000 2 1350 500 3 ; Proc sort data=have out=have uniqueout=want nouniquekey; by po ; run;The data set you want is the one specified by the UNIQUEOUT option and the NOUNIQUEKEY is the appropriate equivalent to NODUPEKEY.
Please note how the data was provided as data step code so that we have something to test with.
Note: OUT= data will not have the same observations, as DATA= as might be mistakenly implied by your example: data=have out=have
@data_null__ wrote:
@ballardw wrote:
Since you want unique values of PO then:
data have; input PO Amount Line ; datalines; 1212 300 1 1233 250 1 1233 600 2 1345 1520 1 1350 1000 2 1350 500 3 ; Proc sort data=have out=have uniqueout=want nouniquekey; by po ; run;The data set you want is the one specified by the UNIQUEOUT option and the NOUNIQUEKEY is the appropriate equivalent to NODUPEKEY.
Please note how the data was provided as data step code so that we have something to test with.
Note: OUT= data will not have the same observations, as DATA= as might be mistakenly implied by your example: data=have out=have
I thought I was getting an error without the out=have, and probably should have named it Reduced or similar.
out=_null_
If you don't need the dups.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.