Hi Friends - i have one SAS dataset with around 1/2 millions of records with some duplicate records on field SKU. I want to know which SKU has maximum duplicate record itself and which has minimum duplicate records for same field, SKU?
Can someone please tell me how can i do this?
Thanks!
proc sql;
create table max_min as
select SKU, count(SKU) as duplicates
from your_table
group by SKU
having duplicates > 1
order by duplicates desc
;
quit;
You will get only duplicate values, unique ones will not show up.
First record will be SKU with maximum duplicates.
proc sql;
create table max_min as
select SKU, count(SKU) as duplicates
from your_table
group by SKU
having duplicates > 1
order by duplicates desc
;
quit;
You will get only duplicate values, unique ones will not show up.
First record will be SKU with maximum duplicates.
Updated slightly, with this you can select any number of min/max values, in this instance I want bottom and top 2 records:
proc sql;
create table WORK.WANT as
select *
from (
select TYPE,
COUNT(TYPE) as COUNT_OF_TYPE
from SASHELP.CARS
group by TYPE
)
order by COUNT_OF_TYPE;
select COUNT(TYPE)
into :NUM_OBS
from WORK.WANT;
quit;
data want;
set want;
if _n_ <= 2 or _n_ > (&NUM_OBS. - 2) then output;
run;
The query in the first response is also doable in the "Query Builder" in exactly the same way, and you can then post-process it with additional queries, the SORT task, and the RANK task.
Tom
Thanks Stataddict, RW9 and Tom.
Everything works grt...!!!
I also did validate number of observation using PROC REPORT.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.