Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 94
Accepted Solution

Obtaining 3 most frequently occuring values/strings in all variables in given dataset

I am trying to figure out how to obtain three most frequently occuring values/strings in all variables in my dataset. That is the first goal, consequently I will try the calculate absolute occurances of each of these values/strings.

 

At the moment I kind of struggle with proc freq. Is there any way how to get the same output as produced by the very last data step with out necessity to manipulate the data so much? Moreover, is it possible to dynamically set proc freq to all variables present within the dataset (rather than writing: tables "variable" and tables "variable_char" by hand and run it separetely for both)?

 

Here is the sample code:

 

%macro rnd(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend rnd;

data tmp;
do i = 1 to 100;
 variable = %rnd(1,5);
 output;
end;
run;

data adding_chars (drop=i);
set tmp;
if variable = 3 then variable_char = 'v';
if variable = 4 then variable_char = 'w';
if variable = 1 then variable_char = 'y';
if variable = 2 then variable_char = 'x';
if variable = 5 then variable_char = 'z';
run;

proc freq data=adding_chars order=freq noprint;
tables variable / list nopercent out=see;
run;

proc freq data=adding_chars order=freq noprint;
tables variable_char / list nopercent out=see1;
run;

data manipulate;
set see (keep=variable obs=3);
run;

data manipulate1;
set see1 (keep=variable_char obs=3);
run;

proc transpose data=manipulate out=fin;
run;

proc transpose data=manipulate1 out=fin1;
var variable_char;
run;

data mod;
set fin (rename=(col1 = colf col2 = cols col3 = colt));
col1 = compress(put(colf,$4.));
col2 = compress(put(cols,$4.));
col3 = compress(put(colt,$4.));
drop colf cols colt;
run;

data result;
set mod fin1;
run;

Thanks for your suggestions!


Accepted Solutions
Solution
‎10-31-2017 07:05 AM
Frequent Contributor
Posts: 94

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Posted in reply to Uknown_user

Here is the full code that provides one with 3 most frequently occuring values/strings in all variables in given dataset:

 

ods table onewayfreqs=temp;
proc freq data=adding_chars order=freq;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;

data MANIPULATED (drop=F_:);
set want;
variable = cats(of F_:);
run;

proc transpose data=manipulated out=first_trans (drop=_name_) prefix=freq;
by table;
run;

proc transpose data=MANIPULATED (keep=table variable) out=second_trans (drop=_name_) prefix=val;
by Table;
var variable;
run;

data final;
retain table val1 val2 val3 freq1 freq2 freq3;
merge
first_trans (in=a)
second_trans (in=b);
by table;
if a;
run;

View solution in original post


All Replies
Super User
Posts: 23,323

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Posted in reply to Uknown_user

Use _Numeric_ for all numeric

Use _CHARACTER_ for a character variables

Use _ALL_ for all variables.

Super User
Posts: 23,323

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

[ Edited ]

And check PROC UNIVARIATE, specifically the examples for a shorter solution to your question.

 

I was thinking maximum, not frequency, for freq, use PROC FREQ. 

Here's a way to get all of them in one table, note that the second step is intended to format/clean up so you can adapt this to your own needs.

 

https://gist.github.com/statgeek/e0903d269d4a71316a4e

Super User
Posts: 6,637

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Posted in reply to Uknown_user

Here's a first step, to put you on the right path when you have hundreds of variables.  This would replace your very first PROC FREQ:

 

ods listing off;

ods onewayfreqs=very_important;

proc freq data=have;

tables _all_;

run;

ods listing;

 

Then take a look at the data set VERY_IMPORTANT.  It will key contain frequency information for all variables and all tables ... not much coding involved to get that result, and all table results grouped into a single data set.  That should be easier to manipulate to come up with results to your liking.  There's still work to be done to get your top 3, but that's the easy part.

 

It's important to avoid the NOPRINT option if you want ODS to create its outputs, thus LISTING gets turned off then on again.

Frequent Contributor
Posts: 94

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Posted in reply to Astounding

Thank you both guys for your suggestions, will try both and post the feedback here.

Frequent Contributor
Posts: 94

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Posted in reply to Astounding

I have adopted this code:

 

ods table onewayfreqs=temp;
proc freq data=adding_chars order=freq;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;

This actually produces this output:

Capture.PNG

Is there any simply way how to put all variables that start with F_: into one col so that it can be transposed easily? Thanks

Super User
Posts: 6,637

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Posted in reply to Uknown_user

First let's look at what the results are telling you.  The first variable is numeric, so the output data set uses F_variable to store the values that it encountered.  The second variable is character so the output data set needs to use a second variable (F_variable_char) to store the values it encounters.  

 

To answer your question, you could use:

 

if f_variable > . then f_variable_char = put(f_variable, best.);

 

I'm not certain whether that's a good idea or not, but it's easy enough to convert all the numeric values to character.

Frequent Contributor
Posts: 94

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

[ Edited ]
Posted in reply to Uknown_user

I used this piece of code to concatenate all variables starting with F_:

 

data manipulated (drop=F_:);

set want;

var = cats(of F_:);

run;
Solution
‎10-31-2017 07:05 AM
Frequent Contributor
Posts: 94

Re: Obtaining 3 most frequently occuring values/strings in all variables in given dataset

Posted in reply to Uknown_user

Here is the full code that provides one with 3 most frequently occuring values/strings in all variables in given dataset:

 

ods table onewayfreqs=temp;
proc freq data=adding_chars order=freq;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;

data MANIPULATED (drop=F_:);
set want;
variable = cats(of F_:);
run;

proc transpose data=manipulated out=first_trans (drop=_name_) prefix=freq;
by table;
run;

proc transpose data=MANIPULATED (keep=table variable) out=second_trans (drop=_name_) prefix=val;
by Table;
var variable;
run;

data final;
retain table val1 val2 val3 freq1 freq2 freq3;
merge
first_trans (in=a)
second_trans (in=b);
by table;
if a;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 147 views
  • 1 like
  • 3 in conversation