BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Uknown_user
Quartz | Level 8

I am trying to figure out how to obtain three most frequently occuring values/strings in all variables in my dataset. That is the first goal, consequently I will try the calculate absolute occurances of each of these values/strings.

 

At the moment I kind of struggle with proc freq. Is there any way how to get the same output as produced by the very last data step with out necessity to manipulate the data so much? Moreover, is it possible to dynamically set proc freq to all variables present within the dataset (rather than writing: tables "variable" and tables "variable_char" by hand and run it separetely for both)?

 

Here is the sample code:

 

%macro rnd(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend rnd;

data tmp;
do i = 1 to 100;
 variable = %rnd(1,5);
 output;
end;
run;

data adding_chars (drop=i);
set tmp;
if variable = 3 then variable_char = 'v';
if variable = 4 then variable_char = 'w';
if variable = 1 then variable_char = 'y';
if variable = 2 then variable_char = 'x';
if variable = 5 then variable_char = 'z';
run;

proc freq data=adding_chars order=freq noprint;
tables variable / list nopercent out=see;
run;

proc freq data=adding_chars order=freq noprint;
tables variable_char / list nopercent out=see1;
run;

data manipulate;
set see (keep=variable obs=3);
run;

data manipulate1;
set see1 (keep=variable_char obs=3);
run;

proc transpose data=manipulate out=fin;
run;

proc transpose data=manipulate1 out=fin1;
var variable_char;
run;

data mod;
set fin (rename=(col1 = colf col2 = cols col3 = colt));
col1 = compress(put(colf,$4.));
col2 = compress(put(cols,$4.));
col3 = compress(put(colt,$4.));
drop colf cols colt;
run;

data result;
set mod fin1;
run;

Thanks for your suggestions!

1 ACCEPTED SOLUTION

Accepted Solutions
Uknown_user
Quartz | Level 8

Here is the full code that provides one with 3 most frequently occuring values/strings in all variables in given dataset:

 

ods table onewayfreqs=temp;
proc freq data=adding_chars order=freq;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;

data MANIPULATED (drop=F_:);
set want;
variable = cats(of F_:);
run;

proc transpose data=manipulated out=first_trans (drop=_name_) prefix=freq;
by table;
run;

proc transpose data=MANIPULATED (keep=table variable) out=second_trans (drop=_name_) prefix=val;
by Table;
var variable;
run;

data final;
retain table val1 val2 val3 freq1 freq2 freq3;
merge
first_trans (in=a)
second_trans (in=b);
by table;
if a;
run;

View solution in original post

8 REPLIES 8
Reeza
Super User

Use _Numeric_ for all numeric

Use _CHARACTER_ for a character variables

Use _ALL_ for all variables.

Reeza
Super User

And check PROC UNIVARIATE, specifically the examples for a shorter solution to your question.

 

I was thinking maximum, not frequency, for freq, use PROC FREQ. 

Here's a way to get all of them in one table, note that the second step is intended to format/clean up so you can adapt this to your own needs.

 

https://gist.github.com/statgeek/e0903d269d4a71316a4e

Astounding
PROC Star

Here's a first step, to put you on the right path when you have hundreds of variables.  This would replace your very first PROC FREQ:

 

ods listing off;

ods onewayfreqs=very_important;

proc freq data=have;

tables _all_;

run;

ods listing;

 

Then take a look at the data set VERY_IMPORTANT.  It will key contain frequency information for all variables and all tables ... not much coding involved to get that result, and all table results grouped into a single data set.  That should be easier to manipulate to come up with results to your liking.  There's still work to be done to get your top 3, but that's the easy part.

 

It's important to avoid the NOPRINT option if you want ODS to create its outputs, thus LISTING gets turned off then on again.

Uknown_user
Quartz | Level 8

Thank you both guys for your suggestions, will try both and post the feedback here.

Uknown_user
Quartz | Level 8

I have adopted this code:

 

ods table onewayfreqs=temp;
proc freq data=adding_chars order=freq;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;

This actually produces this output:

Capture.PNG

Is there any simply way how to put all variables that start with F_: into one col so that it can be transposed easily? Thanks

Astounding
PROC Star

First let's look at what the results are telling you.  The first variable is numeric, so the output data set uses F_variable to store the values that it encountered.  The second variable is character so the output data set needs to use a second variable (F_variable_char) to store the values it encounters.  

 

To answer your question, you could use:

 

if f_variable > . then f_variable_char = put(f_variable, best.);

 

I'm not certain whether that's a good idea or not, but it's easy enough to convert all the numeric values to character.

Uknown_user
Quartz | Level 8

I used this piece of code to concatenate all variables starting with F_:

 

data manipulated (drop=F_:);

set want;

var = cats(of F_:);

run;
Uknown_user
Quartz | Level 8

Here is the full code that provides one with 3 most frequently occuring values/strings in all variables in given dataset:

 

ods table onewayfreqs=temp;
proc freq data=adding_chars order=freq;
table _all_ / list nopercent nocum;
run;

data want;
set temp (keep=table frequency f_:);
table = substr(table,7,length(table)-6);
by table;
if first.table then count = 0;
count + 1;
if count le 3 then output;
drop count;
run;

data MANIPULATED (drop=F_:);
set want;
variable = cats(of F_:);
run;

proc transpose data=manipulated out=first_trans (drop=_name_) prefix=freq;
by table;
run;

proc transpose data=MANIPULATED (keep=table variable) out=second_trans (drop=_name_) prefix=val;
by Table;
var variable;
run;

data final;
retain table val1 val2 val3 freq1 freq2 freq3;
merge
first_trans (in=a)
second_trans (in=b);
by table;
if a;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 648 views
  • 1 like
  • 3 in conversation