BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi

This might be a simple question but I would appreciate some help. I have a series of categorical variables that I would like to recode based on their frequency/count.

So, for example, if I had a series of records in the variable being a, a, a, b, b, c, I would like to recode my variable so that 'a ' (having the highest count) would be coded as 3 and 'c' (having the lowest count) would be coded as 1. Since I have a series of variables it would be hard to recode them manually so was wondering whether there was a command to easily do this.

Thanks!
10 REPLIES 10
art297
Opal | Level 21
Can you post two sample files, namely what you have and the file that you'd like to have?

Somehow, I don't think you would find a record that showed: 3,3,3,2,2,1 of much use.

Art
Sandhya
Fluorite | Level 6
I'm assuming that you need to rank them by their frequency.

proc sql noprint;
create table sample1 as select var1, monotonic() as rank from
(select distinct var1,
count(var1) as count1
from work.sample
group by var1
order by count1)
order by rank;
quit;

Thanks,
Sandhya.
deleted_user
Not applicable
Hi Sandhya,
Yes I wanted to rank the variables by their frequency. Thanks for this! I'll let you know whether I'm successful.
data_null__
Jade | Level 19
PROC SUMMARY has everything you need to compute the ordered frequency and create a variable to ID each level of the character variable based on frequency.

[pre]
data test;
do _n_ = 1 to 20;
length c $1;
c = substr('abcdefghi',rantbl(123456,.2,.1,.4,.15),1);
output;
end;
run;
proc print;
run;
proc summary nway;
class c / order=freq ascending;
output out=code(drop=_type_ index=(c)) / levels;
run;

proc print;
run;
data test2;
set test;
set code key=c/unique;
run;
proc print;
run;
[/pre]
Ksharp
Super User
[pre]
data test;
do _n_ = 1 to 20;
length c $1;
c = substr('abcdefghi',rantbl(123456,.2,.1,.4,.15),1);
output;
end;
run;
proc freq data=test ;
tables c /out=freq;
run;
proc sort data=test;
by c;
run;
data result;
merge test freq;
by c;
keep c count;
run;
proc print noobs;
run;
[/pre]



Ksharp
Cynthia_sas
SAS Super FREQ
Hi:
This Tech Support note:
http://support.sas.com/kb/15/138.html

says

"The MONOTONIC() function is not supported in PROC SQL. Using the
MONOTONIC() function in PROC SQL can cause missing or non-sequential
values to be returned."


So before you go too far down this SQL road with MONOTONIC, you may want to post a small sample of fake data and show the desired results and tell us, in addition:
1) whether your desired output is a report or a dataset
2) if a report, what your ODS destination of interest is (HTML, RTF, PDF)

...and perhaps someone can come up with some ideas that do not require the use of MONOTONIC.

cynthia
Cynthia_sas
SAS Super FREQ
An additional thought.

If all you want is a report (as opposed to actually recoding the variables), then perhaps PROC FREQ will do what you want:
[pre]
data myrecs;
infile datalines;
input name $ var1 $ var2 $ var3;
return;
datalines;
alan b x 4
bob c x 5
carla a x 5
don c x 5
ed a y 6
frank a y 6
;
run;

ods listing close;
ods html file='c:\temp\rankvars.html' style=sasweb;
proc freq data=myrecs order=freq;
tables var: ;
run;
ods html close;
[/pre]

If what you want to do is to show the detail observations/records, but in the order of their rank, then you can do that with PROC REPORT, as shown below (and, PROC REPORT can also make you an output dataset). Since you can only order by one variable at a time, I coded the PROC REPORT step to order by the descending frequency for values of VAR1.

cynthia
[pre]
ods listing close;
ods html file='c:\temp\sol_PROC_REPORT.html' style=sasweb;
proc report data=myrecs nowd
out=rankv1out(drop=n var1show where=(_break_ eq ' '));
title '1) PROC REPORT output in order of Var1 Frequency';
title2 'you may or may not want VAR1SHOW variable in the report';
column var1 name var1show var2 var3 n rankv1;
define var1 / order order=freq descending noprint;
define name / display;
define var1show / computed;
define var2 / display;
define var3 / display;
define n / noprint;
define rankv1 / computed;
break before var1 / ;
compute before var1;
hold = var1;
holdcnt = n;
endcomp;
compute var1show / character length=1;
var1show = hold;
endcomp;
compute rankv1;
rankv1 = holdcnt;
endcomp;
run;

proc print data=rankv1out(drop=_break_);
title '2) Output Dataset created by PROC REPORT';
run;
ods html close;
[/pre]
chang_y_chung_hotmail_com
Obsidian | Level 7
Most of the answers so far seem to miss the OP's point -- which is that OP has a *series* variables to recode. _null_'s brilliant use of proc summary does not seem to extend nicely to multiple variables -- repeatedly running one proc summary per variable can be quite expensive if the data are large.

This kind of vector-oriented operations are, by nature, very easy to implement in vector/matrix oriented languages like R, but quite awkward for Base SAS -- IML may handle this more easily.

Below is my try using Base SAS and some macro variables. It assumes an input of two macro variables, olds and news, both are a single-blank delimited list of variable names, to recode from, and to be recoded into, respectively. The recoding is done in such a way that if there were ties in terms of frequencies, then they are broken by the value's alphabetical order. Missing values are properly propagated.

Code below is admittedly ugly at best and I am not really happy with it. I would love to post this question on sas-l to see what they can come up with.

[pre]
/* test data */
data one;
input id (v1-v5) ($);
cards;
1 a a a . a
2 a b b . a
3 a c c . a
4 b c d . a
5 b c e . a
6 c c . . a
;
run;

%let olds = v1 v2 v3 v4 v5;
%let news = n1 n2 n3 n4 n5;

ods output OneWayFreqs=freqs;
proc freq data=one order=freq;
tables &olds;
run;
ods output clear;
proc freq data=freqs;
tables table/out=levels(keep=table count);
run;

%let cOlds = %sysfunc(translate(&olds,%str(,),%str( )));
data lookup;
length name $32 value $200;
do i = 0 by 1 until (last.table);
merge freqs levels;
by table;
name = scan(table, 2);
value = coalescec(&cOlds);
target = count - i;
if (frequency > 0) then output;
end;
keep name value target frequency;
run;

data two;
set one;
if _n_ = 1 then do;
dcl hash h(dataset:"lookup");
h.definekey('name', 'value');
h.definedata('target');
h.definedone();
end;
array olds{*} $ &olds;
array news{*} &news;
do i = 1 to dim(olds);
name = vname(olds{i});
value = olds{i};
if (h.find()^=0) then target = .;
news{i} = target;
end;
keep id &olds &news;
run;

/* check */
proc print data=two noobs;
run;
/* on lst
id v1 v2 v3 v4 v5 n1 n2 n3 n4 n5
1 a a a a 3 2 5 . 1
2 a b b a 3 1 4 . 1
3 a c c a 3 3 3 . 1
4 b c d a 2 3 2 . 1
5 b c e a 2 3 1 . 1
6 c c a 1 3 . . 1
*/
[/pre]
polingjw
Quartz | Level 8
Here is an alternative solution. This solution uses one sql step and then a single pass through the data with a data step:

[pre]
%macro recode(dataset, vars);
proc sql noprint;
%let i=1;
%do %until(%scan(&vars, &i) = );
%let var = %scan(&vars, &i);
%local &var;
select &var into: &var separated by ' ' from
(select &var, count(*) as count from &dataset where &var ne '' group by &var)
order by count, &var desc;
%let i = %eval(&i + 1);
%end;
quit;

data &dataset._recode;
set &dataset;
%let i = 1;
%do %until(%scan(&vars, &i)= );
%let var = %scan(&vars, &i);
%let j=1;
%do %until(%scan(&&&var, &j)= );
%if %eval(&j > 1) %then else ;
if &var = "%scan(&&&var, &j)" and &var ne "" then &var._recode = &j;
%let j = %eval(&j + 1);
%end;
else &var._recode = .;
%let i = %eval(&i + 1);
%end;
run;
%mend;

data one;
input id (v1-v5) ($);
cards;
1 a a a . a
2 a b b . a
3 a c c . a
4 b c d . a
5 b c e . a
6 c c . . a
;
run;

%recode(one, v1 v2 v3 v4 v5)

proc print data=one_recode;
run;
[/pre]

Message was edited by: polingjw Message was edited by: polingjw

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1604 views
  • 0 likes
  • 8 in conversation