/* Hi Forum,
I have a dataset like below. It provides the income of 2 households that occur in different time periods.
Q: I wanted to get the min, max and mean household income of the entire sample.
I have used the Approach I and II below which generate very different 2 answeres.
Could you please tell me which approach is correct to get the mean income of households in the sample?
*/
data data1;
input HOUSE_ID Date Income;
cards;
111 20170101 25
111 20170208 30
111 20170617 .
333 20170623 400
333 20170705 -0.001
333 20170718 4000
;
run;
/*
Approach I:*/
Proc means data = data1;
Var income;
Run;
/*Answer: Min =-0.001
mAX=4000
Mean = 890*/
/*Approach II*/
proc means data=data1 noprint nway; /*nway keyword is necessary*/
class House_ID;
var Income;
output out=data2 mean=Income_mean;
run;
proc means data=data2;
var Income_mean;
run;
/*Answer: Min= 27.5
Max = 1466.67
Mean =747*/
/*Thansk*/