Hello,
I would like to do the following:
Create a frequency report that count number of customers in each group.
I would like to add the following condtion :
1-I want that the most frequent group will be displayed first and so on (Desending order).
2-I want to display only 6 groups (First 5 groups and the 6th group will get the total count for all groups that are ranked above 5th group)
For example:
The frequency for each group is:
c 10
b 8
a 6
d 6
e 5
f 3
g 2
h 1
i 1
j 1
k 1
l 1
m 2
n 1
o 2
So wanted data set should be :
c 10
b 8
a 6
d 6
e 5
Other 15
What is the way to do it please? (Get 6 groups)?
Data have;
Input ID group $;
cards;
1 c
2 c
3 c
4 c
5 c
6 c
7 c
8 c
9 c
10 c
11 b
12 b
13 b
14 b
15 b
16 b
17 b
18 b
19 a
20 a
21 a
22 a
23 a
24 a
25 d
26 d
27 d
28 d
29 d
30 d
31 e
32 e
33 e
34 e
35 e
36 f
37 f
38 f
39 g
40 g
41 h
42 i
43 j
44 k
45 l
46 m
47 m
48 n
49 o
50 o
;
Run;
Data wanted;
input group $ count;
cards;
c 10
b 8
a 6
d 6
e 5
Other 15
;
Run;
I think that this solution can work well but maybe there is a better way (shorter?)
proc sql;
create table t1 AS
select group,
count(*) as nr
from have
group by group
order by nr desc
;
quit;
data t2;
Serial + 1 ;
set t1;
run ;
proc sql;
create table t3 AS
select 'Other' as group,
count(*) as nr
from t2
where Serial >5
;
quit;
Data wanted;
set t2(where=(Serial <=5) t3 ;
Run;
@Ronein wrote:
Hello,
I would like to do the following:
Create a frequency report that count number of customers in each group.
I would like to add the following condtion :
1-I want that the most frequent group will be displayed first and so on (Desending order).
2-I want to display only 6 groups (First 5 groups and the 6th group will get the total count for all groups that are ranked above 5th group)
For example:
The frequency for each group is:
c 10
b 8
a 6
d 6
e 5f 3
g 2
h 1
i 1
j 1
k 1
l 1
m 2
n 1
o 2
So wanted data set should be :
c 10
b 8
a 6
d 6
e 5Other 15
What is the way to do it please? (Get 6 groups)?
Data have; Input ID group $; cards; 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9 c 10 c 11 b 12 b 13 b 14 b 15 b 16 b 17 b 18 b 19 a 20 a 21 a 22 a 23 a 24 a 25 d 26 d 27 d 28 d 29 d 30 d 31 e 32 e 33 e 34 e 35 e 36 f 37 f 38 f 39 g 40 g 41 h 42 i 43 j 44 k 45 l 46 m 47 m 48 n 49 o 50 o ; Run; Data wanted; input group $ count; cards; c 10 b 8 a 6 d 6 e 5 Other 15 ; Run;
data counts_fmt;
set counts (OBS=3) end=eof;
length label $10.;
fmtname = 'Age_fmt';
start = age;
label = put(age, 8. -l);
output;
if eof then do;
hlo = 'O';
label = 'Other';
output;
end;
run;
proc format cntlin=counts_fmt;
run;
Assuming I was doing this for an age variable and taking the top 3 instead of 6.
PROC PARETO does this for you, if you have SAS/QC licensed. Here is an example: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/qcug/qcug_pareto_examples01.htm
You would want to use the MAXNCAT=5 option to get counts for the highest 5 categories, and then the remaining categories would be set to whatever name is specified by the OTHER= option.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.