Hi Friends - i have one SAS dataset with around 1/2 millions of records with some duplicate records on field SKU. I want to know which SKU has maximum duplicate record itself and which has minimum duplicate records for same field, SKU?
Can someone please tell me how can i do this?
Thanks!
proc sql;
create table max_min as
select SKU, count(SKU) as duplicates
from your_table
group by SKU
having duplicates > 1
order by duplicates desc
;
quit;
You will get only duplicate values, unique ones will not show up.
First record will be SKU with maximum duplicates.
proc sql;
create table max_min as
select SKU, count(SKU) as duplicates
from your_table
group by SKU
having duplicates > 1
order by duplicates desc
;
quit;
You will get only duplicate values, unique ones will not show up.
First record will be SKU with maximum duplicates.
Updated slightly, with this you can select any number of min/max values, in this instance I want bottom and top 2 records:
proc sql;
create table WORK.WANT as
select *
from (
select TYPE,
COUNT(TYPE) as COUNT_OF_TYPE
from SASHELP.CARS
group by TYPE
)
order by COUNT_OF_TYPE;
select COUNT(TYPE)
into :NUM_OBS
from WORK.WANT;
quit;
data want;
set want;
if _n_ <= 2 or _n_ > (&NUM_OBS. - 2) then output;
run;
The query in the first response is also doable in the "Query Builder" in exactly the same way, and you can then post-process it with additional queries, the SORT task, and the RANK task.
Tom
Thanks Stataddict, RW9 and Tom.
Everything works grt...!!!
I also did validate number of observation using PROC REPORT.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.