BookmarkSubscribeRSS Feed
Reddi
Fluorite | Level 6

Hi ,

 

i have the date in the below format  and need to get the group of ID which is having maximum count/observations( In the below example CML0005) . It's just an example , i've a millions of records and need to get the ID having maximum number of observations.

data claims;

input ID $ loss_code $;

cards4;

CML0001 27

CML0002 27C

CML0002 27

CML0003 27Y

CML0003 27C

CML0003 27P

CML0004 27O

CML0004 27Y

CML0004 27

CML0004 27P

CML0005 27P

CML0005 27O

CML0005 27C

CML0005 27

CML0006 27Y

CML0006 27C

CML0007 27O

CML0007 27Y

CML0008 27O

CML0009 27P

;;;;

run;

 

Many thnaks in advance...!

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep, as such this code is a just a guess:

proc sql;
  create table WANT as
  select  distinct ID
  from    HAVE
group by ID having count(*)=max(count(*)); quit;
Reddi
Fluorite | Level 6

Sorry ,

 

you can use the below code to create a data set.

 

data claims;

input ID $ loss_code $;

cards4;

CML0001 27

CML0002 27C

CML0002 27

CML0003 27Y

CML0003 27C

CML0003 27P

CML0004 27O

CML0004 27Y

CML0004 27

CML0004 27P

CML0005 27P

CML0005 27O

CML0005 27C

CML0005 27

CML0006 27Y

CML0006 27C

CML0007 27O

CML0007 27Y

CML0008 27O

CML0009 27P

;;;;

run;

 

PeterClemmensen
Tourmaline | Level 20

Like this? 🙂

 

data have;
input ID $ Codes $;
infile datalines;
datalines;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
CML0005	27O
;

proc sql;
   create table want as
   select ID, count(ID) as countid
   from have
   group by ID;
quit;

proc sort data = want;
   by descending countid;
run;
Astounding
PROC Star

Here is another possibility, that provides a bit more information just in case you need it.  For example, it is conceivable that you need to find the dropoff ... for the ID with the highest count, how many more does it have compared to the next highest count.

 

proc freq data=claims order=freq;

tables ID / noprint out=want (keep=id count);

run;

 

This gives you an output data set with a count of observations for every ID in your data set.  The order is from highest to lowest count, so the first observation has the highest count.  Note that it is possible there are ties, and 5 different IDs have the same "highest" count. 

Reddi
Fluorite | Level 6

Hi Astounding ,

 

Thanks but i'm looking for the highest count ID/ID's only not all the ID's with individual count.

 

 

PeterClemmensen
Tourmaline | Level 20

What do you want to do with the highest count ID/ID's, do you just want them in a dataset or to read it into a macrovariable?

Reddi
Fluorite | Level 6
Hi ,
Either way is good for me
nehalsanghvi
Pyrite | Level 9

The code RW9 posted above will get you what you're looking for: only those ID/IDs with the highest count. If you also want the count included in the result along with the ID, add 'count(*)' to the select statement, the rest of the query remains the same.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 3975 views
  • 2 likes
  • 5 in conversation