Categorize multiple variables based on mean and sd

Reply
Occasional Contributor
Posts: 6

Categorize multiple variables based on mean and sd

Hi all,

I need to categorize many variables (var1-var9) based on the mean and SD. I would like to ask for syntax (macro?) to do these faster and more accurate.

///

var1_3cat=.;

if var1 < (mean-sd) then var1_3cat=1;

if var1 >= (mean-sd) & var1 <= (mean+sd) then var1_3cat=2;

if var1 >(mean+sd) then var1_3cat=3;

///

 

Thank you!

haoduonge

Super User
Posts: 19,194

Re: Categorize multiple variables based on mean and sd

How are the mean and sd calculated? Are they unique to each variable or from a population value or from all of the variables? 

 

You may not need a macro, look at proc stdize/standard to standardize the variables which is essentially what you're trying to do here. 

Occasional Contributor
Posts: 6

Re: Categorize multiple variables based on mean and sd

Mean and SD are calulated from proc means and unique to each variable.

Thanks

PROC Star
Posts: 7,439

Re: Categorize multiple variables based on mean and sd

I would simply create an informat and calculate zscores and apply the informat to that calculation. e.g.:

 

proc format;
  invalue zcat
  .=.
  low-< -1=1
  1.0000000001-high=3
  other=2
  ;
run;
data have;
  input score;
  mean=6;
  std=2;
  var1_3cat=input((score-mean)/std, zcat.);
  cards;
2
3
4
.
5
6
7
8
9
10
;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 19,194

Re: Categorize multiple variables based on mean and sd


haoduonge wrote:

Mean and SD are calulated from proc means and unique to each variable.

Thanks


Proc stdize is a good option. Look at the METHOD options on the PROC statement. 

Super User
Posts: 9,878

Re: Categorize multiple variables based on mean and sd

It is very easy for IML code.


proc iml;
use sashelp.class;
read all var _num_ into x[c=vname];
close;

mean=mean(x);
std=std(x);

want=j(nrow(x),ncol(x),.);
do i=1 to ncol(x);
  cutpoint=min(x[,i])||(mean[i]-std[i])||(mean[i]+std[i])||max(x[,i]);
  want[,i]=bin(x[,i],cutpoint);
end;

print want[c=vname];

run;


Ask a Question
Discussion stats
  • 5 replies
  • 190 views
  • 0 likes
  • 4 in conversation