DATA Step, Macro, Functions and more

getting a group of observations with maximum count

Reply
Contributor
Posts: 23

getting a group of observations with maximum count

[ Edited ]

Hi ,

 

i have the date in the below format  and need to get the group of ID which is having maximum count/observations( In the below example CML0005) . It's just an example , i've a millions of records and need to get the ID having maximum number of observations.

data claims;

input ID $ loss_code $;

cards4;

CML0001 27

CML0002 27C

CML0002 27

CML0003 27Y

CML0003 27C

CML0003 27P

CML0004 27O

CML0004 27Y

CML0004 27

CML0004 27P

CML0005 27P

CML0005 27O

CML0005 27C

CML0005 27

CML0006 27Y

CML0006 27C

CML0007 27O

CML0007 27Y

CML0008 27O

CML0009 27P

;;;;

run;

 

Many thnaks in advance...!

Super User
Super User
Posts: 7,430

Re: getting a group of observations with maximum count

Post test data in the form of a datastep, as such this code is a just a guess:

proc sql;
  create table WANT as
  select  distinct ID
  from    HAVE
group by ID having count(*)=max(count(*)); quit;
Contributor
Posts: 23

Re: getting a group of observations with maximum count

Sorry ,

 

you can use the below code to create a data set.

 

data claims;

input ID $ loss_code $;

cards4;

CML0001 27

CML0002 27C

CML0002 27

CML0003 27Y

CML0003 27C

CML0003 27P

CML0004 27O

CML0004 27Y

CML0004 27

CML0004 27P

CML0005 27P

CML0005 27O

CML0005 27C

CML0005 27

CML0006 27Y

CML0006 27C

CML0007 27O

CML0007 27Y

CML0008 27O

CML0009 27P

;;;;

run;

 

PROC Star
Posts: 554

Re: getting a group of observations with maximum count

Like this? Smiley Happy

 

data have;
input ID $ Codes $;
infile datalines;
datalines;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
CML0005	27O
;

proc sql;
   create table want as
   select ID, count(ID) as countid
   from have
   group by ID;
quit;

proc sort data = want;
   by descending countid;
run;
Super User
Posts: 5,097

Re: getting a group of observations with maximum count

Here is another possibility, that provides a bit more information just in case you need it.  For example, it is conceivable that you need to find the dropoff ... for the ID with the highest count, how many more does it have compared to the next highest count.

 

proc freq data=claims order=freq;

tables ID / noprint out=want (keep=id count);

run;

 

This gives you an output data set with a count of observations for every ID in your data set.  The order is from highest to lowest count, so the first observation has the highest count.  Note that it is possible there are ties, and 5 different IDs have the same "highest" count. 

Contributor
Posts: 23

Re: getting a group of observations with maximum count

Hi Astounding ,

 

Thanks but i'm looking for the highest count ID/ID's only not all the ID's with individual count.

 

 

PROC Star
Posts: 554

Re: getting a group of observations with maximum count

What do you want to do with the highest count ID/ID's, do you just want them in a dataset or to read it into a macrovariable?

Contributor
Posts: 23

Re: getting a group of observations with maximum count

Hi ,
Either way is good for me
Frequent Contributor
Posts: 75

Re: getting a group of observations with maximum count

The code RW9 posted above will get you what you're looking for: only those ID/IDs with the highest count. If you also want the count included in the result along with the ID, add 'count(*)' to the select statement, the rest of the query remains the same.

Ask a Question
Discussion stats
  • 8 replies
  • 166 views
  • 0 likes
  • 5 in conversation