Hi, I am new to SAS. I am doing a simulation study with 100 replications which generate 100 outputs. There are 50 observations in each of the 100 outputs. I want to calculate the mean of each observation over 100 replications. How to do this?
ZhenLi,
Many posters could solve this, with a little more information.
Do you already have 100 SAS data sets? As a follow-up question, would it be easy for you to assemble them into a single SAS data set, with a new variable that takes on values from 1 to 100 indicating the "replication"?
For each variable, do you want to get the mean separately for each replication, and then get the mean of those 100 means?
Hi,
Thank you.
I already have 100 outputs; below is first 10 out of 50 observations in one output. I need to calculate mean for each observations over 100 outputs for 11 variables.
1 0.27189 -1.63333 0.21903 0.348507 -1.250858 0.319263 -1.296945 0.343462 -1.216949 0.352209 -1.198076
2 0.36986 -1.48808 0.19807 0.474084 -1.137540 0.434303 -1.173247 0.467221 -1.101967 0.479120 -1.085949
3 0.33389 -1.32761 0.00000 0.427978 -1.012348 0.392066 -1.036588 0.421783 -0.974936 0.432524 -0.962073
4 0.37094 -2.03541 0.20296 0.475468 -1.564543 0.435571 -1.639363 0.468586 -1.535242 0.480519 -1.508464
5 0.23689 -1.26799 0.22502 0.303644 -0.965835 0.278165 -0.985815 0.299249 -0.927740 0.306869 -0.916049
6 0.51489 -0.60272 0.20248 0.659983 -0.446820 0.604603 -0.419260 0.650429 -0.401102 0.666993 -0.402489
7 0.36227 -1.08792 0.18017 0.464355 -0.825352 0.425391 -0.832464 0.457633 -0.785194 0.469288 -0.777042
8 0.38185 -2.14096 0.20688 0.489453 -1.646889 0.448382 -1.729251 0.482368 -1.618797 0.494652 -1.589944
9 0.31068 -1.13604 0.19791 0.398228 -0.862894 0.364812 -0.873444 0.392463 -0.823286 0.402458 -0.814189
10 0.38501 0.07645 0.21471 0.493503 0.083039 0.452093 0.159133 0.486360 0.136540 0.498745 0.121801
.
Partial code is in the attached file
ZhenLi,
So far, so good. But the objective is still a little hazy.
Do you need one mean for each observation in each data set, getting the average of all 11 variables within a single observation?
Do you need 11 means per data set (the mean of each variable for all 50 observations in each data set)?
Describe the formulas you would like to apply.
Good luck.
Astounding,
Sorry for the confusion.
There are 11 variables and 50 observations in each of 100 outputs.
I need to calculate 11 means for the 11 variables for each of the 50 observations. It will be the mean over 100 outputs for each variable of that observation.
I my code, I was trying to read the same observation (e.g., observation 1) from all 100 outputs and save them in one datafile. and then get summary statistics.
I used append but it does not work very well.
Z.
ZhenLi,
OK, just to make sure that I understand ...
It sounds like you need 550 means in total.
The mean of the first variable, based on 100 observations (1st observation from data set 1, plus first observation from data set 2, ... first observation from data set 100).
The mean of the first variable, based on 100 observations (2nd observation from data set 1, plus second observation from data set 2, ... second observation from data set 100).
The last mean would be:
The mean of the 11th variable, based on 100 observations (50th observation from data set 1, plus 50th observation from data set 2, ... 50th observation from data set 100).
Does this sound right?
yes. Thank you. Z
ZhenLi,
OK, here's an approach that is easily adaptable to using macro language, but you'll have to write the macro.
Good luck.
data combine_all_100;
rownum=0;
do until (done1);
set first_data_set end=done1;
rownum + 1;
output;
end;
rownum=0;
do until (done2);
set second_data_set end=done2;
rownum + 1;
output;
end;
....
rownum=0;
do until (done100);
set hundredth_data_set end=done100;
rownum + 1;
output;
end;
stop;
run;
proc means data=combine_all_100;
var list of numerics excluding rownum;
class rownum;
run;
Here is one way. hth
/* test data */
proc plan seed=12345678;
factors obs=50 ordered rep=100 ordered x=1 of 1000 random;
output out=sim;
run;
/* mean of x over 100 reps for each obs */
proc means data=sim;
var x;
by obs;
run;
/* on lst
obs=1
The MEANS Procedure
Analysis Variable : x
N Mean Std Dev Minimum Maximum
--------------------------------------------------------------------
100 503.3600000 311.3526864 14.0000000 1000.00
--------------------------------------------------------------------
obs=2
Analysis Variable : x
N Mean Std Dev Minimum Maximum
--------------------------------------------------------------------
100 483.4900000 290.8708796 8.0000000 979.0000000
--------------------------------------------------------------------
...
obs=50
Analysis Variable : x
N Mean Std Dev Minimum Maximum
--------------------------------------------------------------------
100 467.2800000 273.5758747 2.0000000 999.0000000
--------------------------------------------------------------------
*/
Hi,
Thank you.
How to use it in my question?
Oh. Boy. You need the third dimension to calculated mean.
I only create 4 tables for demo. You need to change it into 50.
data a1(keep=a:); array a{11}; do k=1 to 50; do i=1 to dim(a); a{i}=ranuni(-1); end; output; end; run; data a2(keep=a: rename=(a1-a11=b1-b11)); array a{11}; do k=1 to 50; do i=1 to dim(a); a{i}=ranuni(-1); end; output; end; run; data a3(keep=a: rename=(a1-a11=c1-c11)); array a{11}; do k=1 to 50; do i=1 to dim(a); a{i}=ranuni(-1); end; output; end; run; data a4(keep=a: rename=(a1-a11=d1-d11)); array a{11}; do k=1 to 50; do i=1 to dim(a); a{i}=ranuni(-1); end; output; end; run; data want(keep=mean:); merge a1-a4; array _a{*} _numeric_; array mean{11}; do i=1 to 11; sum=0; do j=i to dim(_a) by 11; sum+_a{j}; end; mean{i}=sum/4; end; run;
Ksharp
Ksharp, Thank you very much. Is it possible to write in marco?
You can typically wrap most code within a macro and, since Ksharps' code didn't include a cards or datalines statement, I don't see why you couldn't. What do you want to pass into the macro? Just identifiy those values as variables in your macro declaration.
NO. You don't need a macro.
What you need to do is rename the variable name of these 100 datasets to make sure they have unique name.
then use the code:
data want(keep=mean:);
merge a1-a100;
array _a{*} _numeric_;
array mean{11};
do i=1 to 11;
sum=0;
do j=i to dim(_a) by 11;
sum+_a{j};
end;
mean{i}=sum/100;
end;
run;
P.S a1-a100 is your one hundred datasets.
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.