Hi,
I am trying to calculate the z-score for the variable that I have in my dataset using proc standard. All the columns have different mean and std, so my question is should I use a common mean and std deviation for calculating the z-score or I should calculate it separately?
in terms of code:
PROC STANDARD
DATA = X
MEAN = 0
STD = 1
OUT = ZSCORE
VAR
A /* it has a mean of 5 and std of 5 */
B /* it has a mean of 500 and std of 7 */
C /* it has a mean of 900 and std of 1000 */
run;
OR I should use this approach?
PROC STANDARD
DATA = X
MEAN = 5
STD = 5
OUT = ZSCORE_a
VAR
A /* it has a mean of 5 and std of 5 */
run;
PROC STANDARD
DATA = X
MEAN = 500
STD = 7
OUT = ZSCORE_b
VAR
B /* it has a mean of 500 and std of 7 */
run;
PROC STANDARD
DATA = X
MEAN = 900
STD = 1000
OUT = ZSCORE_c
VAR
C /* it has a mean of 900 and std of 1000 */
run;
and then merge all cols
I really appreciate your time and guidance.
Thanks!
If you want z scores, use your first block of code exactly as it is. The mean= and std= options give the TARGET values, not the values of your sample.
Another approach is PROC STDIZE. Something like this:
proc stdize data=X out=zscore sprefix=z_ oprefix=orig;
var A B C;
run;
This will give an output dataset with the original variables prefixed with orig and the z scores prefixed with z_.
I hope this helps.
Steve Denham
Your first block of code will standardize all three variables to a mean of 0, and a standard deviation of 1. This would be a z score. None of the other code blocks will give z scores, but will instead give scaled scores that will look very much like the raw scores, as you are standardizing to the sample mean and standard deviation.
Steve Denham
Steve, thanks for your reply!
So would it be fair, if I standardzied my data with a mean of 0 and std of 1? Since all my variable have different mean and std. Or should I try to get kind of avg of mean, std and plug it my first block of code?
If you want z scores, use your first block of code exactly as it is. The mean= and std= options give the TARGET values, not the values of your sample.
Another approach is PROC STDIZE. Something like this:
proc stdize data=X out=zscore sprefix=z_ oprefix=orig;
var A B C;
run;
This will give an output dataset with the original variables prefixed with orig and the z scores prefixed with z_.
I hope this helps.
Steve Denham
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.