## Calculation within a data step

Solved
Occasional Contributor
Posts: 12

# Calculation within a data step

Good Morning,  I am willing to bet this isn't that hard, but with my little experience with SAS, I haven't figured this one out.

I have a data set with 2 variables being used as classification variables, Product and Machine. For each observation, I have a numeric value for TEST

I want to create for each observation a new variable DIF which is the result of this calculation: DIF= TEST - (Average TEST for all observations for each combination of product and machine)

Basically I want to calculate the difference between the observation versus the average result for each Class (Product*Machine*) and put that in the DIF variable for each observation.

Thanks!

Kim

Accepted Solutions
Solution
‎06-12-2012 09:44 AM
Frequent Contributor
Posts: 95

## Re: Calculation within a data step

If you want the difference between TEST and the average TEST of that Class you can make use of 1st scenario.

Second and third ones are average of average group means and overall mean for TEST, respectively.

data x;

input Product Machine Test;

datalines;

0 0 1

0 0 4

0 0 3

0 1 2

0 1 4

1 0 3

1 0 7

1 1 4

1 1 2

;

run;

proc sort data=x;

by Product Machine;

run;

/* Difference between TEST and group mean */

proc summary data=x nway missing;

class Product Machine;

var Test;

output Out=x_group_mean(drop=_

mean=Mean;

run;

data want_dif_group_mean;

merge x x_group_mean;

by Product Machine;

DIF = Test - Mean;

run;

/* Difference between TEST and mean of group means */

proc summary data=x_group_mean nway missing;

var Mean;

output Out=x_mean_of_group_mean(drop=_

mean=Mean;

run;

data want_dif_mean_of_group_mean;

if _n_ = 1 then set x_mean_of_group_mean;

set x ;

DIF = Test - Mean;

run;

/* Difference between TEST and overall mean */

proc summary data=x nway missing;

var Test;

output Out=x_overall_mean(drop=_

mean=Mean;

run;

data want_dif_overall_mean;

if _n_ = 1 then set x_overall_mean;

set x ;

DIF = Test - Mean;

run;

All Replies
Super Contributor
Posts: 1,636

## Re: Calculation within a data step

do you use proc sql?

data have;

input product \$ machine \$ value;

cards;

a aa 20

a bb 30

a cc 40

b aa 20

b bb 30

b cc 40

a aa 200

a bb 300

a cc 400

b aa 200

b bb 300

b cc 400

;

proc sql;

create table want as

select *,value-mean(value) as dif

from have

group by product,machine;

quit;

proc print;run;

obs    product    machine     dif

1       a         aa        -90
2       a         bb       -135
3       a         cc       -180
4       b         aa        -90
5       b         bb       -135
6       b         cc       -180
7       a         aa         90
8       a         bb        135
9       a         cc        180
10       b         aa         90
11       b         bb        135
12       b         cc        180

Linlin

Message was edited by: Linlin

Occasional Contributor
Posts: 12

## Re: Calculation within a data step

I have never used PROC SQL - I will need to learn this! I am not quite sure how to code the SQL, so I will spend some time better understanding that procedure. Thank you!

It amazes me how quickly you all coem up with code! Thanks everyone!

Kim

Solution
‎06-12-2012 09:44 AM
Frequent Contributor
Posts: 95

## Re: Calculation within a data step

If you want the difference between TEST and the average TEST of that Class you can make use of 1st scenario.

Second and third ones are average of average group means and overall mean for TEST, respectively.

data x;

input Product Machine Test;

datalines;

0 0 1

0 0 4

0 0 3

0 1 2

0 1 4

1 0 3

1 0 7

1 1 4

1 1 2

;

run;

proc sort data=x;

by Product Machine;

run;

/* Difference between TEST and group mean */

proc summary data=x nway missing;

class Product Machine;

var Test;

output Out=x_group_mean(drop=_

mean=Mean;

run;

data want_dif_group_mean;

merge x x_group_mean;

by Product Machine;

DIF = Test - Mean;

run;

/* Difference between TEST and mean of group means */

proc summary data=x_group_mean nway missing;

var Mean;

output Out=x_mean_of_group_mean(drop=_

mean=Mean;

run;

data want_dif_mean_of_group_mean;

if _n_ = 1 then set x_mean_of_group_mean;

set x ;

DIF = Test - Mean;

run;

/* Difference between TEST and overall mean */

proc summary data=x nway missing;

var Test;

output Out=x_overall_mean(drop=_

mean=Mean;

run;

data want_dif_overall_mean;

if _n_ = 1 then set x_overall_mean;

set x ;

DIF = Test - Mean;

run;

Posts: 3,167

## Re: Calculation within a data step

Or 2XDOW will do:

data x;

input Product Machine Test;

datalines;

0 0 1

0 0 4

0 0 3

0 1 2

0 1 4

1 0 3

1 0 7

1 1 4

1 1 2

;

run;

proc sort data=x;

by Product Machine;

run;

data want;

do _n_=1 by 1 until (last.machine);

set x;

by product machine;

_total+test;

if last.machine then _mean=_total/_n_;

end;

do until (last.machine);

set x;

by product machine;

dif=test-_mean;

output;

end;

call missing (_total);

drop _:;

run;

proc print;run;

Haikuo

Valued Guide
Posts: 765

## Re: Calculation within a data step

hi ... using Linlin's data (no sort needed) ...

proc summary data=have nway;

class product machine;

var value;

output out=stats (drop=_: index=(pm=(product machine))) mean=mean;

run;

data want (drop=mean);

set have;

set stats key=pm / unique;

diff = value-mean;

run;

Super Contributor
Posts: 1,636

## Re: Calculation within a data step

or use hash:

data want;

if _n_=1 then do;

if 0 then set stats;

declare hash h(dataset:'stats');

h.definekey('product','machine');

h.definedata('mean');

h.definedone();

end;

set have;

if h.find()=0 then do;

dif=value-mean;

output;

end;

run;

proc print;run;

🔒 This topic is solved and locked.