BookmarkSubscribeRSS Feed
catkat96
Obsidian | Level 7

Hello, I basically need to find out how to calculate the mode / the most frequent value from a column of strings, not numbers. 

 

This is the data I have: many customers for many days

data Have;
	input ID callDate :ddmmyy. employment;
	format callDate ddmmyy10.;
	datalines;
1 15/02/2021 full-time
1 17/02/2021 full-time
1 19/02/2021 full-time
2 10/02/2021 part-time
2 15/02/2021 part-time
3 17/02/2021 full-time
3 19/02/2021 full-time
3 20/02/2021 full-time
4 08/02/2021 full-time
4 10/02/2021 full-time
4 13/02/2021 full-time
4 15/02/2021 full-time
4 17/02/2021 full-time
4 19/02/2021 full-time
5 10/02/2021 unemployed
;
run;

I am creating a table of statistics for each day of my table, where I count some variables and I average others. I owuld be interested in getting the mode per each day. The basic structre of this code is the following:

proc sql;
create table daily_stats as
select calldate
, count( distinct id) as id_number
, *insert mode code here*

from have
group by calldate;
quit;

It would have to look like this:

data daily_stats;
	input callDate :ddmmyy. id_number mode;
	format callDate ddmmyy10.;
	datalines;
08/02/2021 1 full-time
13/02/2021 1 full-time
10/02/2021 3 (to be fair i don't know what the answer would be in this case)
15/02/2021 3 full-time
17/02/2021 3 full-time
19/02/2021 3 full-time
20/02/2021 1 full-time
;
run;

If you can think of an answer in PROC SQL it would be great. Thanks!

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @catkat96,

 

Here's one suggestion:

proc sql;
create table daily_stats as
select calldate
, count(distinct id) as id_number
, (select max(e) from
    (select e from
      (select employment as e, count(*) as c
       from have b
       where b.calldate=a.calldate
       group by e)
     having c=max(c))) as mode
from have a
group by calldate;
quit;

 

  1. The innermost query creates a frequency table of the EMPLOYMENT values within the current CALLDATE value ("BY group").
  2. Then the EMPLOYMENT value(s) -- plural in the case of ties -- having the maximum frequency are selected.
  3. From this list of tied distinct EMPLOYMENT value(s) you can select what you like:
    • my suggestion max(e), i.e., the last value in alphabetical order
    • or min(e), i.e., the first value in alphabetical order
    • or something else (e.g., by using a HAVING clause on that nesting level).
catkat96
Obsidian | Level 7
That is really great, it works fine!
I do wonder if you can think of a way to get many modes at the same time in a compressed manner? I've been copying this code for each mode I need to calculate but I have to do a few so it became very long, and maybe you have an idea 🙂
FreelanceReinh
Jade | Level 19

Glad to hear that it works.

 

So you have more categorical variables like employment.  Here's an example including a second categorical variable, catvar2.

data Have;
 input ID callDate :ddmmyy. employment :$15. catvar2 :$1.;
 format callDate ddmmyy10.;
 datalines;
1 15/02/2021 full-time A
1 17/02/2021 full-time B
1 19/02/2021 full-time B
2 10/02/2021 part-time A
2 15/02/2021 part-time C
3 17/02/2021 full-time C
3 19/02/2021 full-time A
3 20/02/2021 full-time A
4 08/02/2021 full-time B
4 10/02/2021 full-time A
4 13/02/2021 full-time A
4 15/02/2021 full-time C
4 17/02/2021 full-time A
4 19/02/2021 full-time B
5 10/02/2021 unemployed B
;

To get more compact PROC SQL code you can put the repetitive part of the SELECT statement into a macro:

%macro mode(var);
(select max(x) from
  (select x from
    (select &var as x, count(*) as c
     from have b
     where b.calldate=a.calldate
     group by x)
   having c=max(c))) as mode_&var
%mend mode;

Then you just need a short macro call per categorical variable in the PROC SQL step:

proc sql;
create table daily_stats as
select calldate
, count(distinct id) as id_number
, %mode(employment)
, %mode(catvar2)
from have a
group by calldate;
quit;

Result:

                                    mode_        mode_
Obs     callDate     id_number    employment    catvar2

 1     08/02/2021        1        full-time        B
 2     10/02/2021        3        unemployed       A
 3     13/02/2021        1        full-time        A
 4     15/02/2021        3        full-time        C
 5     17/02/2021        3        full-time        C
 6     19/02/2021        3        full-time        B
 7     20/02/2021        1        full-time        A

This works for character and numeric categorical variables.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 746 views
  • 1 like
  • 2 in conversation