DATA Step, Macro, Functions and more

Calculation within a data step

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Calculation within a data step

Good Morning,  I am willing to bet this isn't that hard, but with my little experience with SAS, I haven't figured this one out.

I have a data set with 2 variables being used as classification variables, Product and Machine. For each observation, I have a numeric value for TEST

I want to create for each observation a new variable DIF which is the result of this calculation: DIF= TEST - (Average TEST for all observations for each combination of product and machine)

Basically I want to calculate the difference between the observation versus the average result for each Class (Product*Machine*) and put that in the DIF variable for each observation.

Thanks!

Kim


Accepted Solutions
Solution
‎06-12-2012 09:44 AM
Frequent Contributor
Posts: 95

Re: Calculation within a data step

Posted in reply to Kimberley

If you want the difference between TEST and the average TEST of that Class you can make use of 1st scenario.

Second and third ones are average of average group means and overall mean for TEST, respectively.

data x;

  input Product Machine Test;

datalines;

0 0 1

0 0 4

0 0 3

0 1 2

0 1 4

1 0 3

1 0 7

1 1 4

1 1 2

;

run;

proc sort data=x;

  by Product Machine;

run;

/* Difference between TEST and group mean */

proc summary data=x nway missing;

  class Product Machine;

  var Test;

  output Out=x_group_mean(drop=_Smiley Happy

  mean=Mean;

run;

data want_dif_group_mean;

  merge x x_group_mean;

  by Product Machine;

  DIF = Test - Mean;

run;

/* Difference between TEST and mean of group means */

proc summary data=x_group_mean nway missing;

  var Mean;

  output Out=x_mean_of_group_mean(drop=_Smiley Happy

  mean=Mean;

run;

data want_dif_mean_of_group_mean;

  if _n_ = 1 then set x_mean_of_group_mean;

  set x ;

  DIF = Test - Mean;

run;

/* Difference between TEST and overall mean */

proc summary data=x nway missing;

  var Test;

  output Out=x_overall_mean(drop=_Smiley Happy

  mean=Mean;

run;

data want_dif_overall_mean;

  if _n_ = 1 then set x_overall_mean;

  set x ;

  DIF = Test - Mean;

run;

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Re: Calculation within a data step

Posted in reply to Kimberley

do you use proc sql?

data have;

input product $ machine $ value;

cards;

a aa 20

a bb 30

a cc 40

b aa 20

b bb 30

b cc 40

a aa 200

a bb 300

a cc 400

b aa 200

b bb 300

b cc 400

;

proc sql;

  create table want as

    select *,value-mean(value) as dif

   from have

     group by product,machine;

quit;

proc print;run;

                              obs    product    machine     dif

                                  1       a         aa        -90
                                  2       a         bb       -135
                                  3       a         cc       -180
                                  4       b         aa        -90
                                  5       b         bb       -135
                                  6       b         cc       -180
                                  7       a         aa         90
                                  8       a         bb        135
                                  9       a         cc        180
                                 10       b         aa         90
                                 11       b         bb        135
                                 12       b         cc        180

Linlin

Message was edited by: Linlin

Occasional Contributor
Posts: 12

Re: Calculation within a data step

I have never used PROC SQL - I will need to learn this! I am not quite sure how to code the SQL, so I will spend some time better understanding that procedure. Thank you!

It amazes me how quickly you all coem up with code! Thanks everyone!

Kim

Solution
‎06-12-2012 09:44 AM
Frequent Contributor
Posts: 95

Re: Calculation within a data step

Posted in reply to Kimberley

If you want the difference between TEST and the average TEST of that Class you can make use of 1st scenario.

Second and third ones are average of average group means and overall mean for TEST, respectively.

data x;

  input Product Machine Test;

datalines;

0 0 1

0 0 4

0 0 3

0 1 2

0 1 4

1 0 3

1 0 7

1 1 4

1 1 2

;

run;

proc sort data=x;

  by Product Machine;

run;

/* Difference between TEST and group mean */

proc summary data=x nway missing;

  class Product Machine;

  var Test;

  output Out=x_group_mean(drop=_Smiley Happy

  mean=Mean;

run;

data want_dif_group_mean;

  merge x x_group_mean;

  by Product Machine;

  DIF = Test - Mean;

run;

/* Difference between TEST and mean of group means */

proc summary data=x_group_mean nway missing;

  var Mean;

  output Out=x_mean_of_group_mean(drop=_Smiley Happy

  mean=Mean;

run;

data want_dif_mean_of_group_mean;

  if _n_ = 1 then set x_mean_of_group_mean;

  set x ;

  DIF = Test - Mean;

run;

/* Difference between TEST and overall mean */

proc summary data=x nway missing;

  var Test;

  output Out=x_overall_mean(drop=_Smiley Happy

  mean=Mean;

run;

data want_dif_overall_mean;

  if _n_ = 1 then set x_overall_mean;

  set x ;

  DIF = Test - Mean;

run;

Respected Advisor
Posts: 3,156

Re: Calculation within a data step

Or 2XDOW will do:

data x;

  input Product Machine Test;

datalines;

0 0 1

0 0 4

0 0 3

0 1 2

0 1 4

1 0 3

1 0 7

1 1 4

1 1 2

;

run;

proc sort data=x;

  by Product Machine;

run;

data want;

  do _n_=1 by 1 until (last.machine);

  set x;

by product machine;

_total+test;

if last.machine then _mean=_total/_n_;

end;

do until (last.machine);

  set x;

by product machine;

dif=test-_mean;

output;

end;

call missing (_total);

drop _:;

run;

proc print;run;

Haikuo

Valued Guide
Posts: 765

Re: Calculation within a data step

Posted in reply to Kimberley

hi ... using Linlin's data (no sort needed) ...

proc summary data=have nway;

class product machine;

var value;

output out=stats (drop=_: index=(pm=(product machine))) mean=mean;

run;

data want (drop=mean);

set have;

set stats key=pm / unique;

diff = value-mean;

run;

Super Contributor
Posts: 1,636

Re: Calculation within a data step

or use hash:

data want;

  if _n_=1 then do;

   if 0 then set stats;

    declare hash h(dataset:'stats');

  h.definekey('product','machine');

  h.definedata('mean');

  h.definedone();

  end;

  set have;

    if h.find()=0 then do;

      dif=value-mean;

      output;

                    end;

  run;

  proc print;run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 257 views
  • 7 likes
  • 5 in conversation