BookmarkSubscribeRSS Feed
TX_STAR
Obsidian | Level 7

Hi, I am new to SAS. I am doing a simulation study with 100 replications which generate 100 outputs. There are 50 observations in each of the 100 outputs. I want to calculate the mean of each observation over 100 replications. How to do this?

13 REPLIES 13
Astounding
PROC Star

ZhenLi,

Many posters could solve this, with a little more information.

Do you already have 100 SAS data sets?  As a follow-up question, would it be easy for you to assemble them into a single SAS data set, with a new variable that takes on values from 1 to 100 indicating the "replication"?

For each variable, do you want to get the mean separately for each replication, and then get the mean of those 100 means?

TX_STAR
Obsidian | Level 7

Hi,

Thank you.

I already have 100 outputs; below is first 10 out of 50 observations in one output. I need to calculate mean for each observations over 100 outputs for 11 variables.

1  0.27189  -1.63333  0.21903  0.348507 -1.250858  0.319263 -1.296945  0.343462 -1.216949  0.352209 -1.198076

2  0.36986  -1.48808  0.19807  0.474084 -1.137540  0.434303 -1.173247  0.467221 -1.101967  0.479120 -1.085949

3  0.33389  -1.32761  0.00000  0.427978 -1.012348  0.392066 -1.036588  0.421783 -0.974936  0.432524 -0.962073

4  0.37094  -2.03541  0.20296  0.475468 -1.564543  0.435571 -1.639363  0.468586 -1.535242  0.480519 -1.508464

5  0.23689  -1.26799  0.22502  0.303644 -0.965835  0.278165 -0.985815  0.299249 -0.927740  0.306869 -0.916049

6  0.51489  -0.60272  0.20248  0.659983 -0.446820  0.604603 -0.419260  0.650429 -0.401102  0.666993 -0.402489

7  0.36227  -1.08792  0.18017  0.464355 -0.825352  0.425391 -0.832464  0.457633 -0.785194  0.469288 -0.777042

8  0.38185  -2.14096  0.20688  0.489453 -1.646889  0.448382 -1.729251  0.482368 -1.618797  0.494652 -1.589944

9  0.31068  -1.13604  0.19791  0.398228 -0.862894  0.364812 -0.873444  0.392463 -0.823286  0.402458 -0.814189

10  0.38501  0.07645  0.21471  0.493503  0.083039  0.452093  0.159133  0.486360  0.136540  0.498745  0.121801

.

Partial code is in the attached file

Astounding
PROC Star

ZhenLi,

So far, so good.  But the objective is still a little hazy.

Do you need one mean for each observation in each data set, getting the average of all 11 variables within a single observation?

Do you need 11 means per data set (the mean of each variable for all 50 observations in each data set)?

Describe the formulas you would like to apply.

Good luck.

TX_STAR
Obsidian | Level 7

Astounding,

Sorry for the confusion.

There are 11 variables and 50 observations in each of 100 outputs.

I need to calculate 11 means for the 11 variables for each of the 50 observations. It will be the mean over 100 outputs for each variable of that observation. 

I my code, I was trying to read the same observation (e.g., observation 1) from all 100 outputs and save them in one datafile. and then get summary statistics.

I used append but it does not work very well.

Z.

Astounding
PROC Star

ZhenLi,

OK, just to make sure that I understand ...

It sounds like you need 550 means in total.

The mean of the first variable, based on 100 observations (1st observation from data set 1, plus first observation from data set 2, ... first observation from data set 100).

The mean of the first variable, based on 100 observations (2nd observation from data set 1, plus second observation from data set 2, ... second observation from data set 100).

The last mean would be:

The mean of the 11th variable, based on 100 observations (50th observation from data set 1, plus 50th observation from data set 2, ... 50th observation from data set 100).

Does this sound right?

TX_STAR
Obsidian | Level 7

yes. Thank you. Z

Astounding
PROC Star

ZhenLi,

OK, here's an approach that is easily adaptable to using macro language, but you'll have to write the macro.

Good luck.

data combine_all_100;

   rownum=0;

   do until (done1);

        set first_data_set end=done1;

        rownum + 1;

        output;

   end;

   rownum=0;

   do until (done2);

       set second_data_set end=done2;

       rownum + 1;

       output;

   end;

   ....

   rownum=0;

   do until (done100);

       set hundredth_data_set end=done100;

       rownum + 1;

       output;

   end;

   stop;

run;

proc means data=combine_all_100;

   var list of numerics excluding rownum;

   class rownum;

run;

chang_y_chung_hotmail_com
Obsidian | Level 7

Here is one way. hth

  /* test data */
  proc plan seed=12345678;
     factors obs=50 ordered rep=100 ordered x=1 of 1000 random;
     output out=sim;
  run;

  /* mean of x over 100 reps for each obs */
  proc means data=sim;
     var x;
     by obs;
  run;
  /* on lst
  obs=1

  The MEANS Procedure

                         Analysis Variable : x

     N            Mean         Std Dev         Minimum         Maximum
  --------------------------------------------------------------------
   100     503.3600000     311.3526864      14.0000000         1000.00
  --------------------------------------------------------------------


  obs=2

                         Analysis Variable : x

     N            Mean         Std Dev         Minimum         Maximum
  --------------------------------------------------------------------
   100     483.4900000     290.8708796       8.0000000     979.0000000
  --------------------------------------------------------------------

  ...

  obs=50

                         Analysis Variable : x

     N            Mean         Std Dev         Minimum         Maximum
  --------------------------------------------------------------------
   100     467.2800000     273.5758747       2.0000000     999.0000000
  --------------------------------------------------------------------
  */

TX_STAR
Obsidian | Level 7

Hi,

Thank you.

How to use it in my question?

Ksharp
Super User

Oh. Boy. You need the third dimension to calculated mean.

I only create 4 tables for demo. You need to change it into 50.

data a1(keep=a:);
array a{11};
do k=1 to 50;
do i=1 to dim(a);
 a{i}=ranuni(-1);
end;
 output;
end;
run;
data a2(keep=a: rename=(a1-a11=b1-b11));
array a{11};
do k=1 to 50;
do i=1 to dim(a);
 a{i}=ranuni(-1);
end;
 output;
end;
run;
data a3(keep=a: rename=(a1-a11=c1-c11));
array a{11};
do k=1 to 50;
do i=1 to dim(a);
 a{i}=ranuni(-1);
end;
 output;
end;
run;
data a4(keep=a: rename=(a1-a11=d1-d11));
array a{11};
do k=1 to 50;
do i=1 to dim(a);
 a{i}=ranuni(-1);
end;
 output;
end;
run;




data want(keep=mean:);
 merge a1-a4;
 array _a{*} _numeric_; 
 array mean{11};
 do i=1 to 11;
  sum=0;
  do j=i to dim(_a) by 11;
  sum+_a{j};
  end;
  mean{i}=sum/4;
 end;
run;




Ksharp

TX_STAR
Obsidian | Level 7

Ksharp, Thank you very much. Is it possible to write in marco?

art297
Opal | Level 21

You can typically wrap most code within a macro and, since Ksharps' code didn't include a cards or datalines statement, I don't see why you couldn't.  What do you want to pass into the macro?  Just identifiy those values as variables in your macro declaration.

Ksharp
Super User

NO. You don't need a macro.

What you need to do is rename the variable name of these 100 datasets to make sure they have unique name.

then use the code:

data want(keep=mean:);

merge a1-a100;

array _a{*} _numeric_;

array mean{11};

do i=1 to 11;

  sum=0;

  do j=i to dim(_a) by 11;

  sum+_a{j};

  end;

  mean{i}=sum/100;

end;

run;

P.S a1-a100 is your one hundred datasets.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1161 views
  • 6 likes
  • 5 in conversation