turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- que regarding proc standard

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-20-2012 12:07 PM

Hi,

I am trying to calculate the z-score for the variable that I have in my dataset using proc standard. All the columns have different mean and std, so my question is should I use a common mean and std deviation for calculating the z-score or I should calculate it separately?

in terms of code:

PROC STANDARD

DATA = X

MEAN = 0

STD = 1

OUT = ZSCORE

VAR

A /* it has a mean of 5 and std of 5 */

B /* it has a mean of 500 and std of 7 */

C /* it has a mean of 900 and std of 1000 */

run;

OR I should use this approach?

PROC STANDARD

DATA = X

MEAN = 5

STD = 5

OUT = ZSCORE_a

VAR

A /* it has a mean of 5 and std of 5 */

run;

PROC STANDARD

DATA = X

MEAN = 500

STD = 7

OUT = ZSCORE_b

VAR

B /* it has a mean of 500 and std of 7 */

run;

PROC STANDARD

DATA = X

MEAN = 900

STD = 1000

OUT = ZSCORE_c

VAR

C /* it has a mean of 900 and std of 1000 */

run;

and then merge all cols

I really appreciate your time and guidance.

Thanks!

Accepted Solutions

Solution

06-20-2012
02:08 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-20-2012 02:08 PM

If you want z scores, use your first block of code exactly as it is. The mean= and std= options give the TARGET values, not the values of your sample.

Another approach is PROC STDIZE. Something like this:

proc stdize data=X out=zscore sprefix=z_ oprefix=orig;

var A B C;

run;

This will give an output dataset with the original variables prefixed with orig and the z scores prefixed with z_.

I hope this helps.

Steve Denham

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-20-2012 01:47 PM

Your first block of code will standardize all three variables to a mean of 0, and a standard deviation of 1. This would be a z score. None of the other code blocks will give z scores, but will instead give scaled scores that will look very much like the raw scores, as you are standardizing to the sample mean and standard deviation.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-20-2012 01:55 PM

Steve, thanks for your reply!

So would it be fair, if I standardzied my data with a mean of 0 and std of 1? Since all my variable have different mean and std. Or should I try to get kind of avg of mean, std and plug it my first block of code?

Solution

06-20-2012
02:08 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-20-2012 02:08 PM

If you want z scores, use your first block of code exactly as it is. The mean= and std= options give the TARGET values, not the values of your sample.

Another approach is PROC STDIZE. Something like this:

proc stdize data=X out=zscore sprefix=z_ oprefix=orig;

var A B C;

run;

This will give an output dataset with the original variables prefixed with orig and the z scores prefixed with z_.

I hope this helps.

Steve Denham