Calcite | Level 5

## Standard Deviation of All Observations of Multiple Variables

Hi folks,

I need to compute Standard Deviation of ALL Observations of MULTIPLE Variables.

PROC MEANS can compute std of EACH variable.

STD function can compute std of EACH observation.

Is there a way to compute std of all observations (data points) of multiple variables?

For example, I have 3 variables, var1, var2, var3 and 3 observations.

var1 var2 var3

1      2       3

4      5       6

7      8       9

PROC MEANS can compute std of EACH variable and return std1 of (1, 4, 7), std2 of (2, 5, 8), std3 (3, 6, 9).

STD function in data step can compute std of EACH observation and return std_1_ of (1,2,3), std_2_ of (4, 5, 6) and std_2_ of (7, 8, 9).

But I want to get ONE single standard deviation of all 12 data points, std_all of (1,2,3,4,5,6,7,8,9).

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Opal | Level 21

## Re: Standard Deviation of All Observations of Multiple Variables

There are a number of ways to do it.  I, personally, would use:

data have;

input var1-var3;

recnum=_n_;

cards;

1 2 3

4 5 6

7 8 9

;

proc transpose data=have out=need;

by recnum;

run;

proc means data=need std;

var col1;

run;

9 REPLIES 9
Opal | Level 21

## Standard Deviation of All Observations of Multiple Variables

Richard,

I may or may not understand what you are trying to do.  Does the following do what you want?:

proc means data=sashelp.class std;

var _numeric_;

run;

Calcite | Level 5

## Re: Standard Deviation of All Observations of Multiple Variables

art297, thanks. But I think your approach will return multiple std of EACH numeric variable, not ONE std of ALL elements of ALL numberic variables.

I just revised my original post to clarify my question. thanks.

Super User

## Re: Standard Deviation of All Observations of Multiple Variables

Proc means can not allow you to use all the data to calculate std, you need to make a longitude  variable to contain all the value of variables.Such as

data want;

set have;

var=var1;output;

var=var2;output;

.....

drop var1-var4.

run;

If you have a lot of variables ,then use array.

Ksharp

Opal | Level 21

## Re: Standard Deviation of All Observations of Multiple Variables

There are a number of ways to do it.  I, personally, would use:

data have;

input var1-var3;

recnum=_n_;

cards;

1 2 3

4 5 6

7 8 9

;

proc transpose data=have out=need;

by recnum;

run;

proc means data=need std;

var col1;

run;

Rhodochrosite | Level 12

## Re: Standard Deviation of All Observations of Multiple Variables

Hi .. another idea ...

* 4,000 values;

data x;

do j=1 to 1000;

a = 100*ranuni(999);

b = 100*ranuni(999);

c = 100*ranuni(999);

d = 100*ranuni(999);

output;

end;

drop j;

run;

* macro variable can hold up to 64K characters;

proc sql noprint;

select catx(',', a, b, c, d) into :nnn  separated by ',' from x;

quit;

data _null_;

std = std(&nnn);

put "STANDARD DEVIATION:  " std;

run;

Rhodochrosite | Level 12

## Re: Standard Deviation of All Observations of Multiple Variables

Hi ... got a tip from a friend, down to one PROC ...

data x;

do j=1 to 1000;

a = 100*ranuni(999);

b = 100*ranuni(999);

c = 100*ranuni(999);

d = 100*ranuni(999);

output;

end;

drop j;

run;

proc sql noprint;

select catx(',', a, b, c, d) into :nnn  separated by ',' from x;

reset print ;

select std(&nnn) "STANDARD DEVIATION" from x(obs=1) ;

quit;

Opal | Level 21

## Re: Standard Deviation of All Observations of Multiple Variables

Mike,

Glad you posted this as I hadn't realized that one could create and use a macro variable within one proc sql run.  However, I believe that the OP wanted the sd per record, not for the entire file.

Art

Quartz | Level 8

## Standard Deviation of All Observations of Multiple Variables

If you don't want to reshape the data, a roll-your-own approach with a Double DoW can do it. Something like

data have;

input var1-var3;

cards;

1 2 3

4 5 6

7 8 9

;

data _null_ ;

do until (lastobs) ;

set have end=lastobs ;

n_v +   n(of var:) ;

sum_v + sum(of var:) ;

end ;

lastobs = 0 ;

do until (lastobs) ;

set have end=lastobs ;

array vv

• var: ;
•    do j = 1 to dim(vv) ;

sumsq + ( vv - (sum_v / n_v) )**2 ;

end ;

end ;

std = sqrt( sumsq/(n_v-1) ) ;

put std= ;

run  ;

`richard_hu2003 wrote:Hi folks,I need to compute Standard Deviation of ALL Observations of MULTIPLE Variables.PROC MEANS can compute std of EACH variable.STD function can compute std of EACH observation.Is there a way to compute std of all observations (data points) of multiple variables?For example, I have 3 variables, var1, var2, var3 and 3 observations.var1 var2 var31      2       34      5       67      8       9PROC MEANS can compute std of EACH variable and return std1 of (1, 4, 7), std2 of (2, 5, 8), std3 (3, 6, 9).STD function in data step can compute std of EACH observation and return std_1_ of (1,2,3), std_2_ of (4, 5, 6) and std_2_ of (7, 8, 9).But I want to get ONE single standard deviation of all 12 data points, std_all of (1,2,3,4,5,6,7,8,9).Thanks.`

## Standard Deviation of All Observations of Multiple Variables

`Howles wrote:If you don't want to reshape the data, a roll-your-own approach with a Double DoW can do it.  `

I think one pass will suffice.

data _null_;

do until(lastobs);

set have end=lastobs;

uss = sum(uss,uss(of var:));

sum = sum(sum,of var:);

n   = sum(n,n(of var:));

end;

std = sqrt((uss-(sum**2/n))/(n-1));

put std=;

stop;

run;

Discussion stats
• 9 replies
• 10583 views
• 12 likes
• 6 in conversation