Categorize multiple variables based on mean and sd

Hi all,

I need to categorize many variables (var1-var9) based on the mean and SD. I would like to ask for syntax (macro?) to do these faster and more accurate.

///

var1_3cat=.;

if var1 < (mean-sd) then var1_3cat=1;

if var1 >= (mean-sd) & var1 <= (mean+sd) then var1_3cat=2;

if var1 >(mean+sd) then var1_3cat=3;

///

Thank you!

haoduonge

How are the mean and sd calculated? Are they unique to each variable or from a population value or from all of the variables?

You may not need a macro, look at proc stdize/standard to standardize the variables which is essentially what you're trying to do here.

Mean and SD are calulated from proc means and unique to each variable.

Thanks

I would simply create an informat and calculate zscores and apply the informat to that calculation. e.g.:

```proc format;
invalue zcat
.=.
low-< -1=1
1.0000000001-high=3
other=2
;
run;
data have;
input score;
mean=6;
std=2;
var1_3cat=input((score-mean)/std, zcat.);
cards;
2
3
4
.
5
6
7
8
9
10
;
```

haoduonge wrote:

Mean and SD are calulated from proc means and unique to each variable.

Thanks

Proc stdize is a good option. Look at the METHOD options on the PROC statement.

```It is very easy for IML code.

proc iml;
use sashelp.class;
read all var _num_ into x[c=vname];
close;

mean=mean(x);
std=std(x);

want=j(nrow(x),ncol(x),.);
do i=1 to ncol(x);
cutpoint=min(x[,i])||(mean[i]-std[i])||(mean[i]+std[i])||max(x[,i]);
want[,i]=bin(x[,i],cutpoint);
end;

print want[c=vname];

run;

```
