## Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Solved
Frequent Contributor
Posts: 94

# Obtaining 3 most frequently occuring values/strings in all variables in given dataset

I am trying to figure out how to obtain three most frequently occuring values/strings in all variables in my dataset. That is the first goal, consequently I will try the calculate absolute occurances of each of these values/strings.

At the moment I kind of struggle with proc freq. Is there any way how to get the same output as produced by the very last data step with out necessity to manipulate the data so much? Moreover, is it possible to dynamically set proc freq to all variables present within the dataset (rather than writing: tables "variable" and tables "variable_char" by hand and run it separetely for both)?

Here is the sample code:

```%macro rnd(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend rnd;

data tmp;
do i = 1 to 100;
variable = %rnd(1,5);
output;
end;
run;

set tmp;
if variable = 3 then variable_char = 'v';
if variable = 4 then variable_char = 'w';
if variable = 1 then variable_char = 'y';
if variable = 2 then variable_char = 'x';
if variable = 5 then variable_char = 'z';
run;

tables variable / list nopercent out=see;
run;

tables variable_char / list nopercent out=see1;
run;

data manipulate;
set see (keep=variable obs=3);
run;

data manipulate1;
set see1 (keep=variable_char obs=3);
run;

proc transpose data=manipulate out=fin;
run;

proc transpose data=manipulate1 out=fin1;
var variable_char;
run;

data mod;
set fin (rename=(col1 = colf col2 = cols col3 = colt));
col1 = compress(put(colf,\$4.));
col2 = compress(put(cols,\$4.));
col3 = compress(put(colt,\$4.));
drop colf cols colt;
run;

data result;
set mod fin1;
run;```

Accepted Solutions
Solution
‎10-31-2017 07:05 AM
Frequent Contributor
Posts: 94

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Here is the full code that provides one with 3 most frequently occuring values/strings in all variables in given dataset:

```ods table onewayfreqs=temp;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;

data MANIPULATED (drop=F_:);
set want;
variable = cats(of F_:);
run;

proc transpose data=manipulated out=first_trans (drop=_name_) prefix=freq;
by table;
run;

proc transpose data=MANIPULATED (keep=table variable) out=second_trans (drop=_name_) prefix=val;
by Table;
var variable;
run;

data final;
retain table val1 val2 val3 freq1 freq2 freq3;
merge
first_trans (in=a)
second_trans (in=b);
by table;
if a;
run;```

All Replies
Super User
Posts: 23,323

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Use _Numeric_ for all numeric

Use _CHARACTER_ for a character variables

Use _ALL_ for all variables.

Super User
Posts: 23,323

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

[ Edited ]

And check PROC UNIVARIATE, specifically the examples for a shorter solution to your question.

I was thinking maximum, not frequency, for freq, use PROC FREQ.

Here's a way to get all of them in one table, note that the second step is intended to format/clean up so you can adapt this to your own needs.

https://gist.github.com/statgeek/e0903d269d4a71316a4e

Super User
Posts: 6,637

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Here's a first step, to put you on the right path when you have hundreds of variables.  This would replace your very first PROC FREQ:

ods listing off;

ods onewayfreqs=very_important;

proc freq data=have;

tables _all_;

run;

ods listing;

Then take a look at the data set VERY_IMPORTANT.  It will key contain frequency information for all variables and all tables ... not much coding involved to get that result, and all table results grouped into a single data set.  That should be easier to manipulate to come up with results to your liking.  There's still work to be done to get your top 3, but that's the easy part.

It's important to avoid the NOPRINT option if you want ODS to create its outputs, thus LISTING gets turned off then on again.

Frequent Contributor
Posts: 94

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Thank you both guys for your suggestions, will try both and post the feedback here.

Frequent Contributor
Posts: 94

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

```ods table onewayfreqs=temp;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;```

This actually produces this output:

Is there any simply way how to put all variables that start with F_: into one col so that it can be transposed easily? Thanks

Super User
Posts: 6,637

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

First let's look at what the results are telling you.  The first variable is numeric, so the output data set uses F_variable to store the values that it encountered.  The second variable is character so the output data set needs to use a second variable (F_variable_char) to store the values it encounters.

if f_variable > . then f_variable_char = put(f_variable, best.);

I'm not certain whether that's a good idea or not, but it's easy enough to convert all the numeric values to character.

Frequent Contributor
Posts: 94

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

[ Edited ]

I used this piece of code to concatenate all variables starting with F_:

```data manipulated (drop=F_:);

set want;

var = cats(of F_:);

run;```
Solution
‎10-31-2017 07:05 AM
Frequent Contributor
Posts: 94

## Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Here is the full code that provides one with 3 most frequently occuring values/strings in all variables in given dataset:

```ods table onewayfreqs=temp;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;

data MANIPULATED (drop=F_:);
set want;
variable = cats(of F_:);
run;

proc transpose data=manipulated out=first_trans (drop=_name_) prefix=freq;
by table;
run;

proc transpose data=MANIPULATED (keep=table variable) out=second_trans (drop=_name_) prefix=val;
by Table;
var variable;
run;

data final;
retain table val1 val2 val3 freq1 freq2 freq3;
merge
first_trans (in=a)
second_trans (in=b);
by table;
if a;
run;```
☑ This topic is solved.