Hello, I basically need to find out how to calculate the mode / the most frequent value from a column of strings, not numbers.
This is the data I have: many customers for many days
data Have;
input ID callDate :ddmmyy. employment;
format callDate ddmmyy10.;
datalines;
1 15/02/2021 full-time
1 17/02/2021 full-time
1 19/02/2021 full-time
2 10/02/2021 part-time
2 15/02/2021 part-time
3 17/02/2021 full-time
3 19/02/2021 full-time
3 20/02/2021 full-time
4 08/02/2021 full-time
4 10/02/2021 full-time
4 13/02/2021 full-time
4 15/02/2021 full-time
4 17/02/2021 full-time
4 19/02/2021 full-time
5 10/02/2021 unemployed
;
run;
I am creating a table of statistics for each day of my table, where I count some variables and I average others. I owuld be interested in getting the mode per each day. The basic structre of this code is the following:
proc sql;
create table daily_stats as
select calldate
, count( distinct id) as id_number
, *insert mode code here*
from have
group by calldate;
quit;
It would have to look like this:
data daily_stats;
input callDate :ddmmyy. id_number mode;
format callDate ddmmyy10.;
datalines;
08/02/2021 1 full-time
13/02/2021 1 full-time
10/02/2021 3 (to be fair i don't know what the answer would be in this case)
15/02/2021 3 full-time
17/02/2021 3 full-time
19/02/2021 3 full-time
20/02/2021 1 full-time
;
run;
If you can think of an answer in PROC SQL it would be great. Thanks!
Hello @catkat96,
Here's one suggestion:
proc sql;
create table daily_stats as
select calldate
, count(distinct id) as id_number
, (select max(e) from
(select e from
(select employment as e, count(*) as c
from have b
where b.calldate=a.calldate
group by e)
having c=max(c))) as mode
from have a
group by calldate;
quit;
Glad to hear that it works.
So you have more categorical variables like employment. Here's an example including a second categorical variable, catvar2.
data Have;
input ID callDate :ddmmyy. employment :$15. catvar2 :$1.;
format callDate ddmmyy10.;
datalines;
1 15/02/2021 full-time A
1 17/02/2021 full-time B
1 19/02/2021 full-time B
2 10/02/2021 part-time A
2 15/02/2021 part-time C
3 17/02/2021 full-time C
3 19/02/2021 full-time A
3 20/02/2021 full-time A
4 08/02/2021 full-time B
4 10/02/2021 full-time A
4 13/02/2021 full-time A
4 15/02/2021 full-time C
4 17/02/2021 full-time A
4 19/02/2021 full-time B
5 10/02/2021 unemployed B
;
To get more compact PROC SQL code you can put the repetitive part of the SELECT statement into a macro:
%macro mode(var);
(select max(x) from
(select x from
(select &var as x, count(*) as c
from have b
where b.calldate=a.calldate
group by x)
having c=max(c))) as mode_&var
%mend mode;
Then you just need a short macro call per categorical variable in the PROC SQL step:
proc sql;
create table daily_stats as
select calldate
, count(distinct id) as id_number
, %mode(employment)
, %mode(catvar2)
from have a
group by calldate;
quit;
Result:
mode_ mode_ Obs callDate id_number employment catvar2 1 08/02/2021 1 full-time B 2 10/02/2021 3 unemployed A 3 13/02/2021 1 full-time A 4 15/02/2021 3 full-time C 5 17/02/2021 3 full-time C 6 19/02/2021 3 full-time B 7 20/02/2021 1 full-time A
This works for character and numeric categorical variables.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.