Hi ,
i have the date in the below format and need to get the group of ID which is having maximum count/observations( In the below example CML0005) . It's just an example , i've a millions of records and need to get the ID having maximum number of observations.
data claims;
input ID $ loss_code $;
cards4;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;;;;
run;
Many thnaks in advance...!
Post test data in the form of a datastep, as such this code is a just a guess:
proc sql; create table WANT as select distinct ID from HAVE
group by ID having count(*)=max(count(*)); quit;
Sorry ,
you can use the below code to create a data set.
data claims;
input ID $ loss_code $;
cards4;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;;;;
run;
Like this? 🙂
data have;
input ID $ Codes $;
infile datalines;
datalines;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
CML0005 27O
;
proc sql;
create table want as
select ID, count(ID) as countid
from have
group by ID;
quit;
proc sort data = want;
by descending countid;
run;
Here is another possibility, that provides a bit more information just in case you need it. For example, it is conceivable that you need to find the dropoff ... for the ID with the highest count, how many more does it have compared to the next highest count.
proc freq data=claims order=freq;
tables ID / noprint out=want (keep=id count);
run;
This gives you an output data set with a count of observations for every ID in your data set. The order is from highest to lowest count, so the first observation has the highest count. Note that it is possible there are ties, and 5 different IDs have the same "highest" count.
Hi Astounding ,
Thanks but i'm looking for the highest count ID/ID's only not all the ID's with individual count.
What do you want to do with the highest count ID/ID's, do you just want them in a dataset or to read it into a macrovariable?
The code RW9 posted above will get you what you're looking for: only those ID/IDs with the highest count. If you also want the count included in the result along with the ID, add 'count(*)' to the select statement, the rest of the query remains the same.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.