hello,
I know that with proc means I can find summary statistics for my data such as mean and N per variable. However, how can I then use these summary statistics in my data step, so that I can do something like find the distance of each observation from the mean, and then finally add up these distances.
i guess that sometimes i want to operate at the per observation level, and then sometimes i want to operate at the aggregate level, and i'm not quite sure what the approach is to switching back and forth between these. hope i'm making some sense.
is the example helpful:
proc sql noprint;
select mean(height) into :mn
from sashelp.class;
quit;
data class;
set sashelp.class;
mean_height=&mn;
diff=height-&mn;
proc print;run;
Obs Name Sex Age Height Weight height diff
1 Alfred M 14 69.0 112.5 62.3368 6.6632
2 Alice F 13 56.5 84.0 62.3368 -5.8368
3 Barbara F 13 65.3 98.0 62.3368 2.9632
4 Carol F 14 62.8 102.5 62.3368 0.4632
5 Henry M 14 63.5 102.5 62.3368 1.1632
6 James M 12 57.3 83.0 62.3368 -5.0368
7 Jane F 12 59.8 84.5 62.3368 -2.5368
8 Janet F 15 62.5 112.5 62.3368 0.1632
9 Jeffrey M 13 62.5 84.0 62.3368 0.1632
10 John M 12 59.0 99.5 62.3368 -3.3368
11 Joyce F 11 51.3 50.5 62.3368 -11.0368
12 Judy F 14 64.3 90.0 62.3368 1.9632
13 Louise F 12 56.3 77.0 62.3368 -6.0368
14 Mary F 15 66.5 112.0 62.3368 4.1632
15 Philip M 16 72.0 150.0 62.3368 9.6632
16 Robert M 12 64.8 128.0 62.3368 2.4632
17 Ronald M 15 67.0 133.0 62.3368 4.6632
18 Thomas M 11 57.5 85.0 62.3368 -4.8368
19 William M 15 66.5 112.0 62.3368 4.1632
Add the statistics back to your datastep. Search on the forum for many ways to do that. This is especially useful if you have statistics at a group level.
You can also look at some of the other stats that proc means gives you because they can be useful.
Or something like :
proc means;
..
output out=stat .....;
run;
data want;
set have;
if _n_ eq 1 then set stat ;
...........
Ksharp
You can add ods statement to collect statistics into new dataset which might be helpful to you.
ODS OUTPUT summary=summary_means; /* summary_means is the new dataset */
PROC MEANS DATA= <DATASET_NAME>;
VAR years_on_farm;
RUN;
ODS OUTPUT CLOSE;
Instead of MEANS, use SQL to calculate statistics and then load them into macro variables that you reference in the data step.
eg.
PROC SQL noprint;
select mean(age), mean(weight) into :AverageAge, AverageWeight from your_data_set;
quit;
Data your_data_set2;
set your_data_set;
Age_variance=Age - &AverageAge;
Weight_variance=weight - &AverageWeight;
run;
Forgive me if I need another cup of coffee on this one, but ...
If you compute the difference from the mean on each observation, then add up all the differences, doesn't the total have to be zero?
No because of rounding/floating point error
Yes otherwise :smileysilly:
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.