BookmarkSubscribeRSS Feed
Ronein
Onyx | Level 15

Hello,

I would like to do the following:

Create a frequency report that count number of customers in each group.

I would like to add the following condtion :

1-I want that the most frequent group will be displayed first and so on (Desending order).

2-I want to display only 6 groups (First 5 groups and the 6th group will get the total count for all groups that are ranked above 5th group)

For example:

The frequency for each group is:

c 10
b 8
a 6
d 6
e 5

f 3 

g 2

h 1

i 1

j 1

k 1

l 1 

m 2

n 1

o 2

So wanted data set should be :

c 10
b 8
a 6
d 6
e 5

Other 15

 

What is the way to do it please? (Get 6 groups)?


Data have;
Input ID  group $;
cards;
1 c
2 c
3 c
4 c
5 c
6 c
7 c
8 c
9 c
10 c
11 b
12 b
13 b
14 b
15 b
16 b
17 b
18 b
19 a
20 a
21 a
22 a
23 a
24 a
25 d
26 d
27 d
28 d
29 d
30 d
31 e
32 e
33 e
34 e
35 e
36 f
37 f
38 f
39 g
40 g
41 h
42  i
43 j
44 k
45 l
46 m
47 m
48 n
49 o
50 o
;
Run;

Data wanted;
input group $ count;
cards;
c 10
b 8
a 6
d 6
e 5
Other 15
;
Run;

 

7 REPLIES 7
Ronein
Onyx | Level 15

I think that this solution can work well but maybe there is a better way (shorter?)

proc sql;
create table t1 AS
select group,
       count(*) as nr
from have
group by  group
order by nr desc
;
quit;

data t2;
Serial + 1 ;
set t1;
run ;

proc sql;
create table t3 AS
select 'Other' as group,
       count(*) as nr
from t2
where Serial >5
;
quit;

Data wanted;
set t2(where=(Serial <=5)  t3 ;
Run;

Reeza
Super User
  1. Do a proc freq and get summaries into a data set
  2. Sort by frequency to get top 6 versus others
  3. Create a custom format that groups the top N into individual categories and everything else goes to other. Use PROC FORMAT + CNTLIN to do this step
  4. Re-run Step1 applying custom format from #3

@Ronein wrote:

Hello,

I would like to do the following:

Create a frequency report that count number of customers in each group.

I would like to add the following condtion :

1-I want that the most frequent group will be displayed first and so on (Desending order).

2-I want to display only 6 groups (First 5 groups and the 6th group will get the total count for all groups that are ranked above 5th group)

For example:

The frequency for each group is:

c 10
b 8
a 6
d 6
e 5

f 3 

g 2

h 1

i 1

j 1

k 1

l 1 

m 2

n 1

o 2

So wanted data set should be :

c 10
b 8
a 6
d 6
e 5

Other 15

 

What is the way to do it please? (Get 6 groups)?


Data have;
Input ID  group $;
cards;
1 c
2 c
3 c
4 c
5 c
6 c
7 c
8 c
9 c
10 c
11 b
12 b
13 b
14 b
15 b
16 b
17 b
18 b
19 a
20 a
21 a
22 a
23 a
24 a
25 d
26 d
27 d
28 d
29 d
30 d
31 e
32 e
33 e
34 e
35 e
36 f
37 f
38 f
39 g
40 g
41 h
42  i
43 j
44 k
45 l
46 m
47 m
48 n
49 o
50 o
;
Run;

Data wanted;
input group $ count;
cards;
c 10
b 8
a 6
d 6
e 5
Other 15
;
Run;

 


 

Ronein
Onyx | Level 15
Thanks, may you please show code for your solution?
Reeza
Super User
I'm fairly certain you can figure out the code 🙂
If you need help post your code and issues.

Not sure it's any more efficient than your current solution though.
Ronein
Onyx | Level 15
May you please show step 3?
Reeza
Super User
data counts_fmt;
set counts (OBS=3) end=eof;
length label $10.;

fmtname = 'Age_fmt';
start = age;
label = put(age, 8. -l);

output;
if eof then do;
hlo = 'O';
label = 'Other';
output;
end;

run;

proc format cntlin=counts_fmt;
run;

Assuming I was doing this for an age variable and taking the top 3 instead of 6.

PaigeMiller
Diamond | Level 26

PROC PARETO does this for you, if you have SAS/QC licensed. Here is an example: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/qcug/qcug_pareto_examples01.htm

 

You would want to use the MAXNCAT=5 option to get counts for the highest 5 categories, and then the remaining categories would be set to whatever name is specified by the OTHER= option.

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1354 views
  • 3 likes
  • 3 in conversation