Desktop productivity for business analysts and programmers

counting obs. that have maximum duplicate records

Accepted Solution Solved
Reply
Contributor
Posts: 55
Accepted Solution

counting obs. that have maximum duplicate records

Hi Friends - i have one SAS dataset with around 1/2 millions of records with some duplicate records on field SKU. I want to know which SKU has maximum duplicate record itself and which has minimum duplicate records for same field, SKU?

Can someone please tell me how can i do this?

Thanks!


Accepted Solutions
Solution
‎03-11-2014 03:01 AM
Occasional Contributor
Posts: 14

Re: counting obs. that have maximum duplicate records

proc sql;

create table max_min as

select SKU, count(SKU) as duplicates

from your_table

group by SKU

having duplicates > 1

order by duplicates desc

;

quit;

You will get only duplicate values, unique ones will not show up.

First record will be SKU with maximum duplicates.

View solution in original post


All Replies
Solution
‎03-11-2014 03:01 AM
Occasional Contributor
Posts: 14

Re: counting obs. that have maximum duplicate records

proc sql;

create table max_min as

select SKU, count(SKU) as duplicates

from your_table

group by SKU

having duplicates > 1

order by duplicates desc

;

quit;

You will get only duplicate values, unique ones will not show up.

First record will be SKU with maximum duplicates.

Super User
Super User
Posts: 9,855

Re: counting obs. that have maximum duplicate records

Posted in reply to stataddict

Updated slightly, with this you can select any number of min/max values, in this instance I want bottom and top 2 records:

proc sql;

  create table WORK.WANT as

  select  *

  from    (

            select  TYPE,

                    COUNT(TYPE) as COUNT_OF_TYPE

            from    SASHELP.CARS

            group by TYPE

          )

  order by COUNT_OF_TYPE;

  select  COUNT(TYPE)

  into    :NUM_OBS

  from    WORK.WANT;

quit;

data want;

  set want;

  if _n_ <= 2 or _n_ > (&NUM_OBS. - 2) then output;

run;

PROC Star
Posts: 1,334

Re: counting obs. that have maximum duplicate records

Posted in reply to stataddict

The query in the first response is also doable in the "Query Builder" in exactly the same way, and you can then post-process it with additional queries, the SORT task, and the RANK task.

Tom

Contributor
Posts: 55

Re: counting obs. that have maximum duplicate records

Thanks Stataddict, RW9 and Tom.

Everything works grt...!!!

I also did validate number of observation using PROC REPORT.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 733 views
  • 6 likes
  • 4 in conversation