BookmarkSubscribeRSS Feed
catkat96
Obsidian | Level 7

Hello, I basically need to find out how to calculate the mode / the most frequent value from a column of strings, not numbers. 

 

This is the data I have: many customers for many days

data Have;
	input ID callDate :ddmmyy. employment;
	format callDate ddmmyy10.;
	datalines;
1 15/02/2021 full-time
1 17/02/2021 full-time
1 19/02/2021 full-time
2 10/02/2021 part-time
2 15/02/2021 part-time
3 17/02/2021 full-time
3 19/02/2021 full-time
3 20/02/2021 full-time
4 08/02/2021 full-time
4 10/02/2021 full-time
4 13/02/2021 full-time
4 15/02/2021 full-time
4 17/02/2021 full-time
4 19/02/2021 full-time
5 10/02/2021 unemployed
;
run;

I am creating a table of statistics for each day of my table, where I count some variables and I average others. I owuld be interested in getting the mode per each day. The basic structre of this code is the following:

proc sql;
create table daily_stats as
select calldate
, count( distinct id) as id_number
, *insert mode code here*

from have
group by calldate;
quit;

It would have to look like this:

data daily_stats;
	input callDate :ddmmyy. id_number mode;
	format callDate ddmmyy10.;
	datalines;
08/02/2021 1 full-time
13/02/2021 1 full-time
10/02/2021 3 (to be fair i don't know what the answer would be in this case)
15/02/2021 3 full-time
17/02/2021 3 full-time
19/02/2021 3 full-time
20/02/2021 1 full-time
;
run;

If you can think of an answer in PROC SQL it would be great. Thanks!

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @catkat96,

 

Here's one suggestion:

proc sql;
create table daily_stats as
select calldate
, count(distinct id) as id_number
, (select max(e) from
    (select e from
      (select employment as e, count(*) as c
       from have b
       where b.calldate=a.calldate
       group by e)
     having c=max(c))) as mode
from have a
group by calldate;
quit;

 

  1. The innermost query creates a frequency table of the EMPLOYMENT values within the current CALLDATE value ("BY group").
  2. Then the EMPLOYMENT value(s) -- plural in the case of ties -- having the maximum frequency are selected.
  3. From this list of tied distinct EMPLOYMENT value(s) you can select what you like:
    • my suggestion max(e), i.e., the last value in alphabetical order
    • or min(e), i.e., the first value in alphabetical order
    • or something else (e.g., by using a HAVING clause on that nesting level).
catkat96
Obsidian | Level 7
That is really great, it works fine!
I do wonder if you can think of a way to get many modes at the same time in a compressed manner? I've been copying this code for each mode I need to calculate but I have to do a few so it became very long, and maybe you have an idea 🙂
FreelanceReinh
Jade | Level 19

Glad to hear that it works.

 

So you have more categorical variables like employment.  Here's an example including a second categorical variable, catvar2.

data Have;
 input ID callDate :ddmmyy. employment :$15. catvar2 :$1.;
 format callDate ddmmyy10.;
 datalines;
1 15/02/2021 full-time A
1 17/02/2021 full-time B
1 19/02/2021 full-time B
2 10/02/2021 part-time A
2 15/02/2021 part-time C
3 17/02/2021 full-time C
3 19/02/2021 full-time A
3 20/02/2021 full-time A
4 08/02/2021 full-time B
4 10/02/2021 full-time A
4 13/02/2021 full-time A
4 15/02/2021 full-time C
4 17/02/2021 full-time A
4 19/02/2021 full-time B
5 10/02/2021 unemployed B
;

To get more compact PROC SQL code you can put the repetitive part of the SELECT statement into a macro:

%macro mode(var);
(select max(x) from
  (select x from
    (select &var as x, count(*) as c
     from have b
     where b.calldate=a.calldate
     group by x)
   having c=max(c))) as mode_&var
%mend mode;

Then you just need a short macro call per categorical variable in the PROC SQL step:

proc sql;
create table daily_stats as
select calldate
, count(distinct id) as id_number
, %mode(employment)
, %mode(catvar2)
from have a
group by calldate;
quit;

Result:

                                    mode_        mode_
Obs     callDate     id_number    employment    catvar2

 1     08/02/2021        1        full-time        B
 2     10/02/2021        3        unemployed       A
 3     13/02/2021        1        full-time        A
 4     15/02/2021        3        full-time        C
 5     17/02/2021        3        full-time        C
 6     19/02/2021        3        full-time        B
 7     20/02/2021        1        full-time        A

This works for character and numeric categorical variables.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 541 views
  • 1 like
  • 2 in conversation