04-09-2017 12:13 AM
I need to categorize many variables (var1-var9) based on the mean and SD. I would like to ask for syntax (macro?) to do these faster and more accurate.
if var1 < (mean-sd) then var1_3cat=1;
if var1 >= (mean-sd) & var1 <= (mean+sd) then var1_3cat=2;
if var1 >(mean+sd) then var1_3cat=3;
04-09-2017 12:30 AM
How are the mean and sd calculated? Are they unique to each variable or from a population value or from all of the variables?
You may not need a macro, look at proc stdize/standard to standardize the variables which is essentially what you're trying to do here.
04-09-2017 12:47 AM
I would simply create an informat and calculate zscores and apply the informat to that calculation. e.g.:
proc format; invalue zcat .=. low-< -1=1 1.0000000001-high=3 other=2 ; run; data have; input score; mean=6; std=2; var1_3cat=input((score-mean)/std, zcat.); cards; 2 3 4 . 5 6 7 8 9 10 ;
Art, CEO, AnalystFinder.com
04-09-2017 01:06 AM
Mean and SD are calulated from proc means and unique to each variable.
Proc stdize is a good option. Look at the METHOD options on the PROC statement.
04-09-2017 07:34 AM
It is very easy for IML code. proc iml; use sashelp.class; read all var _num_ into x[c=vname]; close; mean=mean(x); std=std(x); want=j(nrow(x),ncol(x),.); do i=1 to ncol(x); cutpoint=min(x[,i])||(mean[i]-std[i])||(mean[i]+std[i])||max(x[,i]); want[,i]=bin(x[,i],cutpoint); end; print want[c=vname]; run;