Solved
Contributor
Posts: 38

# Which "min". "Max" and "average" are correct?

/* Hi Forum,

I have a dataset like below. It provides the income of 2 households that occur in different time periods.

Q: I wanted to get the min, max and mean household income of the entire sample.

I have used the Approach I and II below which generate very different 2 answeres.

Could you please tell me which approach is correct to get the mean income of households in the sample?

*/

data data1;

input HOUSE_ID Date Income;

cards;

111 20170101 25

111 20170208 30

111 20170617 .

333 20170623 400

333 20170705 -0.001

333 20170718 4000

;

run;

/*

Approach I:*/

Proc means data = data1;

Var income;

Run;

mAX=4000

Mean = 890*/

/*Approach II*/

proc means data=data1 noprint nway; /*nway keyword is necessary*/

class House_ID;

var Income;

output out=data2 mean=Income_mean;

run;

proc means data=data2;

var Income_mean;

run;

Max = 1466.67

Mean =747*/

/*Thansk*/

Accepted Solutions
Solution
‎08-10-2017 12:58 PM
Super User
Posts: 11,343

## Re: Which "min". "Max" and "average" are correct?

Actually I'm going to first throw a wrench: Perhaps you only want to include the "latest" income for each household, or perhaps household incomes within a specified time frame.

The "correct" one would depend on what kind of question you want to answer. If you want to discuss differences across households then some form of reduce to household (latest, earliest, mean within time frame) first and then summarize similar to your approach II. If the question is just within sample then the first. I also would tend to want N and standard deviations just to let me know if there's something unexpected about the data.

And I would be very tempted to discard records with negative income.

All Replies
SAS Super FREQ
Posts: 305

## Re: Which "min". "Max" and "average" are correct?

The second approach is wrong.  You can't take means of a data set of means and in general get anything meaningful.

Solution
‎08-10-2017 12:58 PM
Super User
Posts: 11,343

## Re: Which "min". "Max" and "average" are correct?

Actually I'm going to first throw a wrench: Perhaps you only want to include the "latest" income for each household, or perhaps household incomes within a specified time frame.

The "correct" one would depend on what kind of question you want to answer. If you want to discuss differences across households then some form of reduce to household (latest, earliest, mean within time frame) first and then summarize similar to your approach II. If the question is just within sample then the first. I also would tend to want N and standard deviations just to let me know if there's something unexpected about the data.

And I would be very tempted to discard records with negative income.

☑ This topic is solved.