Hello, I basically need to find out how to calculate the mode / the most frequent value from a column of strings, not numbers.
This is the data I have: many customers for many days
data Have;
input ID callDate :ddmmyy. employment;
format callDate ddmmyy10.;
datalines;
1 15/02/2021 full-time
1 17/02/2021 full-time
1 19/02/2021 full-time
2 10/02/2021 part-time
2 15/02/2021 part-time
3 17/02/2021 full-time
3 19/02/2021 full-time
3 20/02/2021 full-time
4 08/02/2021 full-time
4 10/02/2021 full-time
4 13/02/2021 full-time
4 15/02/2021 full-time
4 17/02/2021 full-time
4 19/02/2021 full-time
5 10/02/2021 unemployed
;
run;
I am creating a table of statistics for each day of my table, where I count some variables and I average others. I owuld be interested in getting the mode per each day. The basic structre of this code is the following:
proc sql;
create table daily_stats as
select calldate
, count( distinct id) as id_number
, *insert mode code here*
from have
group by calldate;
quit;
It would have to look like this:
data daily_stats;
input callDate :ddmmyy. id_number mode;
format callDate ddmmyy10.;
datalines;
08/02/2021 1 full-time
13/02/2021 1 full-time
10/02/2021 3 (to be fair i don't know what the answer would be in this case)
15/02/2021 3 full-time
17/02/2021 3 full-time
19/02/2021 3 full-time
20/02/2021 1 full-time
;
run;
If you can think of an answer in PROC SQL it would be great. Thanks!
Hello @catkat96,
Here's one suggestion:
proc sql;
create table daily_stats as
select calldate
, count(distinct id) as id_number
, (select max(e) from
(select e from
(select employment as e, count(*) as c
from have b
where b.calldate=a.calldate
group by e)
having c=max(c))) as mode
from have a
group by calldate;
quit;
Glad to hear that it works.
So you have more categorical variables like employment. Here's an example including a second categorical variable, catvar2.
data Have;
input ID callDate :ddmmyy. employment :$15. catvar2 :$1.;
format callDate ddmmyy10.;
datalines;
1 15/02/2021 full-time A
1 17/02/2021 full-time B
1 19/02/2021 full-time B
2 10/02/2021 part-time A
2 15/02/2021 part-time C
3 17/02/2021 full-time C
3 19/02/2021 full-time A
3 20/02/2021 full-time A
4 08/02/2021 full-time B
4 10/02/2021 full-time A
4 13/02/2021 full-time A
4 15/02/2021 full-time C
4 17/02/2021 full-time A
4 19/02/2021 full-time B
5 10/02/2021 unemployed B
;
To get more compact PROC SQL code you can put the repetitive part of the SELECT statement into a macro:
%macro mode(var);
(select max(x) from
(select x from
(select &var as x, count(*) as c
from have b
where b.calldate=a.calldate
group by x)
having c=max(c))) as mode_&var
%mend mode;
Then you just need a short macro call per categorical variable in the PROC SQL step:
proc sql;
create table daily_stats as
select calldate
, count(distinct id) as id_number
, %mode(employment)
, %mode(catvar2)
from have a
group by calldate;
quit;
Result:
mode_ mode_ Obs callDate id_number employment catvar2 1 08/02/2021 1 full-time B 2 10/02/2021 3 unemployed A 3 13/02/2021 1 full-time A 4 15/02/2021 3 full-time C 5 17/02/2021 3 full-time C 6 19/02/2021 3 full-time B 7 20/02/2021 1 full-time A
This works for character and numeric categorical variables.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.