Hello,
I would like to do the following:
Create a frequency report that count number of customers in each group.
I would like to add the following condtion :
1-I want that the most frequent group will be displayed first and so on (Desending order).
2-I want to display only 6 groups (First 5 groups and the 6th group will get the total count for all groups that are ranked above 5th group)
For example:
The frequency for each group is:
c 10
b 8
a 6
d 6
e 5
f 3
g 2
h 1
i 1
j 1
k 1
l 1
m 2
n 1
o 2
So wanted data set should be :
c 10
b 8
a 6
d 6
e 5
Other 15
What is the way to do it please? (Get 6 groups)?
Data have;
Input ID group $;
cards;
1 c
2 c
3 c
4 c
5 c
6 c
7 c
8 c
9 c
10 c
11 b
12 b
13 b
14 b
15 b
16 b
17 b
18 b
19 a
20 a
21 a
22 a
23 a
24 a
25 d
26 d
27 d
28 d
29 d
30 d
31 e
32 e
33 e
34 e
35 e
36 f
37 f
38 f
39 g
40 g
41 h
42 i
43 j
44 k
45 l
46 m
47 m
48 n
49 o
50 o
;
Run;
Data wanted;
input group $ count;
cards;
c 10
b 8
a 6
d 6
e 5
Other 15
;
Run;
I think that this solution can work well but maybe there is a better way (shorter?)
proc sql;
create table t1 AS
select group,
count(*) as nr
from have
group by group
order by nr desc
;
quit;
data t2;
Serial + 1 ;
set t1;
run ;
proc sql;
create table t3 AS
select 'Other' as group,
count(*) as nr
from t2
where Serial >5
;
quit;
Data wanted;
set t2(where=(Serial <=5) t3 ;
Run;
@Ronein wrote:
Hello,
I would like to do the following:
Create a frequency report that count number of customers in each group.
I would like to add the following condtion :
1-I want that the most frequent group will be displayed first and so on (Desending order).
2-I want to display only 6 groups (First 5 groups and the 6th group will get the total count for all groups that are ranked above 5th group)
For example:
The frequency for each group is:
c 10
b 8
a 6
d 6
e 5f 3
g 2
h 1
i 1
j 1
k 1
l 1
m 2
n 1
o 2
So wanted data set should be :
c 10
b 8
a 6
d 6
e 5Other 15
What is the way to do it please? (Get 6 groups)?
Data have; Input ID group $; cards; 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9 c 10 c 11 b 12 b 13 b 14 b 15 b 16 b 17 b 18 b 19 a 20 a 21 a 22 a 23 a 24 a 25 d 26 d 27 d 28 d 29 d 30 d 31 e 32 e 33 e 34 e 35 e 36 f 37 f 38 f 39 g 40 g 41 h 42 i 43 j 44 k 45 l 46 m 47 m 48 n 49 o 50 o ; Run; Data wanted; input group $ count; cards; c 10 b 8 a 6 d 6 e 5 Other 15 ; Run;
data counts_fmt;
set counts (OBS=3) end=eof;
length label $10.;
fmtname = 'Age_fmt';
start = age;
label = put(age, 8. -l);
output;
if eof then do;
hlo = 'O';
label = 'Other';
output;
end;
run;
proc format cntlin=counts_fmt;
run;
Assuming I was doing this for an age variable and taking the top 3 instead of 6.
PROC PARETO does this for you, if you have SAS/QC licensed. Here is an example: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/qcug/qcug_pareto_examples01.htm
You would want to use the MAXNCAT=5 option to get counts for the highest 5 categories, and then the remaining categories would be set to whatever name is specified by the OTHER= option.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.