Re: getting a group of observations with maximum count

Reddi · Posted 02-28-2017 04:25 AM

Hi ,

i have the date in the below format and need to get the group of ID which is having maximum count/observations( In the below example CML0005) . It's just an example , i've a millions of records and need to get the ID having maximum number of observations.

data claims;

input ID $ loss_code $;

cards4;

CML0001 27

CML0002 27C

CML0002 27

CML0003 27Y

CML0003 27C

CML0003 27P

CML0004 27O

CML0004 27Y

CML0004 27

CML0004 27P

CML0005 27P

CML0005 27O

CML0005 27C

CML0005 27

CML0006 27Y

CML0006 27C

CML0007 27O

CML0007 27Y

CML0008 27O

CML0009 27P

;;;;

run;

Many thnaks in advance...!

RW9 · Posted 02-28-2017 04:29 AM

Post test data in the form of a datastep, as such this code is a just a guess:

proc sql;
  create table WANT as
  select  distinct ID
  from    HAVE
  group by ID
  having count(*)=max(count(*));
quit;

Reddi · Posted 02-28-2017 04:34 AM

Sorry ,

you can use the below code to create a data set.

data claims;

input ID $ loss_code $;

cards4;

CML0001 27

CML0002 27C

CML0002 27

CML0003 27Y

CML0003 27C

CML0003 27P

CML0004 27O

CML0004 27Y

CML0004 27

CML0004 27P

CML0005 27P

CML0005 27O

CML0005 27C

CML0005 27

CML0006 27Y

CML0006 27C

CML0007 27O

CML0007 27Y

CML0008 27O

CML0009 27P

;;;;

run;

PeterClemmensen · Posted 02-28-2017 04:31 AM

Like this? 🙂

data have;
input ID $ Codes $;
infile datalines;
datalines;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
CML0005	27O
;

proc sql;
   create table want as
   select ID, count(ID) as countid
   from have
   group by ID;
quit;

proc sort data = want;
   by descending countid;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Astounding · Posted 02-28-2017 09:18 AM

Here is another possibility, that provides a bit more information just in case you need it. For example, it is conceivable that you need to find the dropoff ... for the ID with the highest count, how many more does it have compared to the next highest count.

proc freq data=claims order=freq;

tables ID / noprint out=want (keep=id count);

run;

This gives you an output data set with a count of observations for every ID in your data set. The order is from highest to lowest count, so the first observation has the highest count. Note that it is possible there are ties, and 5 different IDs have the same "highest" count.

Reddi · Posted 03-01-2017 03:45 AM

Hi Astounding ,

Thanks but i'm looking for the highest count ID/ID's only not all the ID's with individual count.

PeterClemmensen · Posted 03-01-2017 04:13 AM

What do you want to do with the highest count ID/ID's, do you just want them in a dataset or to read it into a macrovariable?

The DATA to DATA Step Macro
Blog: SASnrd

Reddi · Posted 03-01-2017 09:03 AM

Hi ,
Either way is good for me

nehalsanghvi · Posted 03-01-2017 09:37 AM

The code RW9 posted above will get you what you're looking for: only those ID/IDs with the highest count. If you also want the count included in the result along with the ID, add 'count(*)' to the select statement, the rest of the query remains the same.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away