Hi all,
I need to categorize many variables (var1-var9) based on the mean and SD. I would like to ask for syntax (macro?) to do these faster and more accurate.
///
var1_3cat=.;
if var1 < (mean-sd) then var1_3cat=1;
if var1 >= (mean-sd) & var1 <= (mean+sd) then var1_3cat=2;
if var1 >(mean+sd) then var1_3cat=3;
///
Thank you!
haoduonge
How are the mean and sd calculated? Are they unique to each variable or from a population value or from all of the variables?
You may not need a macro, look at proc stdize/standard to standardize the variables which is essentially what you're trying to do here.
Mean and SD are calulated from proc means and unique to each variable.
Thanks
I would simply create an informat and calculate zscores and apply the informat to that calculation. e.g.:
proc format; invalue zcat .=. low-< -1=1 1.0000000001-high=3 other=2 ; run; data have; input score; mean=6; std=2; var1_3cat=input((score-mean)/std, zcat.); cards; 2 3 4 . 5 6 7 8 9 10 ;
Art, CEO, AnalystFinder.com
@haoduonge wrote:
Mean and SD are calulated from proc means and unique to each variable.
Thanks
Proc stdize is a good option. Look at the METHOD options on the PROC statement.
It is very easy for IML code. proc iml; use sashelp.class; read all var _num_ into x[c=vname]; close; mean=mean(x); std=std(x); want=j(nrow(x),ncol(x),.); do i=1 to ncol(x); cutpoint=min(x[,i])||(mean[i]-std[i])||(mean[i]+std[i])||max(x[,i]); want[,i]=bin(x[,i],cutpoint); end; print want[c=vname]; run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.