Desktop productivity for business analysts and programmers

counting obs. that have maximum duplicate records

Accepted Solution Solved
Reply
Contributor
Posts: 55
Accepted Solution

counting obs. that have maximum duplicate records

Hi Friends - i have one SAS dataset with around 1/2 millions of records with some duplicate records on field SKU. I want to know which SKU has maximum duplicate record itself and which has minimum duplicate records for same field, SKU?

Can someone please tell me how can i do this?

Thanks!


Accepted Solutions
Solution
‎03-11-2014 03:01 AM
Occasional Contributor
Posts: 14

Re: counting obs. that have maximum duplicate records

proc sql;

create table max_min as

select SKU, count(SKU) as duplicates

from your_table

group by SKU

having duplicates > 1

order by duplicates desc

;

quit;

You will get only duplicate values, unique ones will not show up.

First record will be SKU with maximum duplicates.

View solution in original post


All Replies
Solution
‎03-11-2014 03:01 AM
Occasional Contributor
Posts: 14

Re: counting obs. that have maximum duplicate records

proc sql;

create table max_min as

select SKU, count(SKU) as duplicates

from your_table

group by SKU

having duplicates > 1

order by duplicates desc

;

quit;

You will get only duplicate values, unique ones will not show up.

First record will be SKU with maximum duplicates.

Super User
Super User
Posts: 7,727

Re: counting obs. that have maximum duplicate records

Updated slightly, with this you can select any number of min/max values, in this instance I want bottom and top 2 records:

proc sql;

  create table WORK.WANT as

  select  *

  from    (

            select  TYPE,

                    COUNT(TYPE) as COUNT_OF_TYPE

            from    SASHELP.CARS

            group by TYPE

          )

  order by COUNT_OF_TYPE;

  select  COUNT(TYPE)

  into    :NUM_OBS

  from    WORK.WANT;

quit;

data want;

  set want;

  if _n_ <= 2 or _n_ > (&NUM_OBS. - 2) then output;

run;

PROC Star
Posts: 1,146

Re: counting obs. that have maximum duplicate records

The query in the first response is also doable in the "Query Builder" in exactly the same way, and you can then post-process it with additional queries, the SORT task, and the RANK task.

Tom

Contributor
Posts: 55

Re: counting obs. that have maximum duplicate records

Thanks Stataddict, RW9 and Tom.

Everything works grt...!!!

I also did validate number of observation using PROC REPORT.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 565 views
  • 6 likes
  • 4 in conversation