Hi all,
I need to categorize many variables (var1-var9) based on the mean and SD. I would like to ask for syntax (macro?) to do these faster and more accurate.
///
var1_3cat=.;
if var1 < (mean-sd) then var1_3cat=1;
if var1 >= (mean-sd) & var1 <= (mean+sd) then var1_3cat=2;
if var1 >(mean+sd) then var1_3cat=3;
///
Thank you!
haoduonge
How are the mean and sd calculated? Are they unique to each variable or from a population value or from all of the variables?
You may not need a macro, look at proc stdize/standard to standardize the variables which is essentially what you're trying to do here.
Mean and SD are calulated from proc means and unique to each variable.
Thanks
I would simply create an informat and calculate zscores and apply the informat to that calculation. e.g.:
proc format; invalue zcat .=. low-< -1=1 1.0000000001-high=3 other=2 ; run; data have; input score; mean=6; std=2; var1_3cat=input((score-mean)/std, zcat.); cards; 2 3 4 . 5 6 7 8 9 10 ;
Art, CEO, AnalystFinder.com
@haoduonge wrote:
Mean and SD are calulated from proc means and unique to each variable.
Thanks
Proc stdize is a good option. Look at the METHOD options on the PROC statement.
It is very easy for IML code. proc iml; use sashelp.class; read all var _num_ into x[c=vname]; close; mean=mean(x); std=std(x); want=j(nrow(x),ncol(x),.); do i=1 to ncol(x); cutpoint=min(x[,i])||(mean[i]-std[i])||(mean[i]+std[i])||max(x[,i]); want[,i]=bin(x[,i],cutpoint); end; print want[c=vname]; run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.