Re: how to recode variable using counts

deleted_user · Posted 09-10-2010 02:44 PM

Hi

This might be a simple question but I would appreciate some help. I have a series of categorical variables that I would like to recode based on their frequency/count.

So, for example, if I had a series of records in the variable being a, a, a, b, b, c, I would like to recode my variable so that 'a ' (having the highest count) would be coded as 3 and 'c' (having the lowest count) would be coded as 1. Since I have a series of variables it would be hard to recode them manually so was wondering whether there was a command to easily do this.

Thanks!

art297 · Posted 09-10-2010 03:12 PM

Can you post two sample files, namely what you have and the file that you'd like to have?

Somehow, I don't think you would find a record that showed: 3,3,3,2,2,1 of much use.

Art

Sandhya · Posted 09-10-2010 03:33 PM

I'm assuming that you need to rank them by their frequency.

proc sql noprint;
create table sample1 as select var1, monotonic() as rank from
(select distinct var1,
count(var1) as count1
from work.sample
group by var1
order by count1)
order by rank;
quit;

Thanks,
Sandhya.

deleted_user · Posted 09-10-2010 05:31 PM

Hi Sandhya,
Yes I wanted to rank the variables by their frequency. Thanks for this! I'll let you know whether I'm successful.

data_null__ · Posted 09-11-2010 07:44 AM

PROC SUMMARY has everything you need to compute the ordered frequency and create a variable to ID each level of the character variable based on frequency.

[pre]
data test;
do _n_ = 1 to 20;
length c $1;
c = substr('abcdefghi',rantbl(123456,.2,.1,.4,.15),1);
output;
end;
run;
proc print;
run;
proc summary nway;
class c / order=freq ascending;
output out=code(drop=_type_ index=(c)) / levels;
run;
proc print;
run;
data test2;
set test;
set code key=c/unique;
run;
proc print;
run;
[/pre]

Ksharp · Posted 09-13-2010 10:18 PM

[pre]
data test;
do _n_ = 1 to 20;
length c $1;
c = substr('abcdefghi',rantbl(123456,.2,.1,.4,.15),1);
output;
end;
run;
proc freq data=test ;
tables c /out=freq;
run;
proc sort data=test;
by c;
run;
data result;
merge test freq;
by c;
keep c count;
run;
proc print noobs;
run;
[/pre]

Ksharp

Cynthia_sas · Posted 09-10-2010 09:36 PM

Hi:
This Tech Support note:
http://support.sas.com/kb/15/138.html

says

"The MONOTONIC() function is not supported in PROC SQL. Using the
MONOTONIC() function in PROC SQL can cause missing or non-sequential
values to be returned."

So before you go too far down this SQL road with MONOTONIC, you may want to post a small sample of fake data and show the desired results and tell us, in addition:
1) whether your desired output is a report or a dataset
2) if a report, what your ODS destination of interest is (HTML, RTF, PDF)

...and perhaps someone can come up with some ideas that do not require the use of MONOTONIC.

cynthia

Cynthia_sas · Posted 09-10-2010 10:39 PM

An additional thought.

If all you want is a report (as opposed to actually recoding the variables), then perhaps PROC FREQ will do what you want:
[pre]
data myrecs;
infile datalines;
input name $ var1 $ var2 $ var3;
return;
datalines;
alan b x 4
bob c x 5
carla a x 5
don c x 5
ed a y 6
frank a y 6
;
run;

ods listing close;
ods html file='c:\temp\rankvars.html' style=sasweb;
proc freq data=myrecs order=freq;
tables var: ;
run;
ods html close;
[/pre]

If what you want to do is to show the detail observations/records, but in the order of their rank, then you can do that with PROC REPORT, as shown below (and, PROC REPORT can also make you an output dataset). Since you can only order by one variable at a time, I coded the PROC REPORT step to order by the descending frequency for values of VAR1.

cynthia
[pre]
ods listing close;
ods html file='c:\temp\sol_PROC_REPORT.html' style=sasweb;
proc report data=myrecs nowd
out=rankv1out(drop=n var1show where=(_break_ eq ' '));
title '1) PROC REPORT output in order of Var1 Frequency';
title2 'you may or may not want VAR1SHOW variable in the report';
column var1 name var1show var2 var3 n rankv1;
define var1 / order order=freq descending noprint;
define name / display;
define var1show / computed;
define var2 / display;
define var3 / display;
define n / noprint;
define rankv1 / computed;
break before var1 / ;
compute before var1;
hold = var1;
holdcnt = n;
endcomp;
compute var1show / character length=1;
var1show = hold;
endcomp;
compute rankv1;
rankv1 = holdcnt;
endcomp;
run;

proc print data=rankv1out(drop=_break_);
title '2) Output Dataset created by PROC REPORT';
run;
ods html close;
[/pre]

chang_y_chung_hotmail_com · Posted 09-14-2010 05:20 PM

Most of the answers so far seem to miss the OP's point -- which is that OP has a *series* variables to recode. _null_'s brilliant use of proc summary does not seem to extend nicely to multiple variables -- repeatedly running one proc summary per variable can be quite expensive if the data are large.

This kind of vector-oriented operations are, by nature, very easy to implement in vector/matrix oriented languages like R, but quite awkward for Base SAS -- IML may handle this more easily.

Below is my try using Base SAS and some macro variables. It assumes an input of two macro variables, olds and news, both are a single-blank delimited list of variable names, to recode from, and to be recoded into, respectively. The recoding is done in such a way that if there were ties in terms of frequencies, then they are broken by the value's alphabetical order. Missing values are properly propagated.

Code below is admittedly ugly at best and I am not really happy with it. I would love to post this question on sas-l to see what they can come up with.

[pre]
/* test data */
data one;
input id (v1-v5) ($);
cards;
1 a a a . a
2 a b b . a
3 a c c . a
4 b c d . a
5 b c e . a
6 c c . . a
;
run;

%let olds = v1 v2 v3 v4 v5;
%let news = n1 n2 n3 n4 n5;

ods output OneWayFreqs=freqs;
proc freq data=one order=freq;
tables &olds;
run;
ods output clear;
proc freq data=freqs;
tables table/out=levels(keep=table count);
run;

%let cOlds = %sysfunc(translate(&olds,%str(,),%str( )));
data lookup;
length name $32 value $200;
do i = 0 by 1 until (last.table);
merge freqs levels;
by table;
name = scan(table, 2);
value = coalescec(&cOlds);
target = count - i;
if (frequency > 0) then output;
end;
keep name value target frequency;
run;

data two;
set one;
if _n_ = 1 then do;
dcl hash h(dataset:"lookup");
h.definekey('name', 'value');
h.definedata('target');
h.definedone();
end;
array olds{*} $ &olds;
array news{*} &news;
do i = 1 to dim(olds);
name = vname(olds{i});
value = olds{i};
if (h.find()^=0) then target = .;
news{i} = target;
end;
keep id &olds &news;
run;

/* check */
proc print data=two noobs;
run;
/* on lst
id v1 v2 v3 v4 v5 n1 n2 n3 n4 n5
1 a a a a 3 2 5 . 1
2 a b b a 3 1 4 . 1
3 a c c a 3 3 3 . 1
4 b c d a 2 3 2 . 1
5 b c e a 2 3 1 . 1
6 c c a 1 3 . . 1
*/
[/pre]

chang_y_chung_hotmail_com · Posted 09-16-2010 12:17 PM

SAS-L does not disappoint. Check out these excellent solutions:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1009c&L=sas-l&D=1&P=5849
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1009c&L=sas-l&D=1&O=D&P=6542

polingjw · Posted 09-20-2010 07:25 AM

Here is an alternative solution. This solution uses one sql step and then a single pass through the data with a data step:

[pre]
%macro recode(dataset, vars);
proc sql noprint;
%let i=1;
%do %until(%scan(&vars, &i) = );
%let var = %scan(&vars, &i);
%local &var;
select &var into: &var separated by ' ' from
(select &var, count(*) as count from &dataset where &var ne '' group by &var)
order by count, &var desc;
%let i = %eval(&i + 1);
%end;
quit;

data &dataset._recode;
set &dataset;
%let i = 1;
%do %until(%scan(&vars, &i)= );
%let var = %scan(&vars, &i);
%let j=1;
%do %until(%scan(&&&var, &j)= );
%if %eval(&j > 1) %then else ;
if &var = "%scan(&&&var, &j)" and &var ne "" then &var._recode = &j;
%let j = %eval(&j + 1);
%end;
else &var._recode = .;
%let i = %eval(&i + 1);
%end;
run;
%mend;

data one;
input id (v1-v5) ($);
cards;
1 a a a . a
2 a b b . a
3 a c c . a
4 b c d . a
5 b c e . a
6 c c . . a
;
run;

%recode(one, v1 v2 v3 v4 v5)

proc print data=one_recode;
run;
[/pre]

Message was edited by: polingjw Message was edited by: polingjw

Registration is open

SAS Training: Just a Click Away