Compute weighted average using SQL: missing values

xyxu · Posted 06-04-2021 09:43 PM

I compute weighted average using the following

proc sql;
create table mytable as
select sum(X * Y) / sum(Y) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;

I realize that there are errors in computed weighted average "wave" when some values of variable "X" are missing within a group "Z", whereas "Y" are nonmissing. The denominator would be the sum of all nonmissing values of "Y", but the numerator is only the sum of nonmissing "X". Is there a good practice to solve this issue?

PGStats · Posted 06-04-2021 09:52 PM

proc sql;
create table mytable as
select sum(X * Y) / sum(case when X is missing then . else Y end) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;

(untested)

You might want to modify the expression for ave as well.

PG

PGStats · Posted 06-05-2021 11:45 AM

Now, if you wanted wave and ave to be consistent (i.e. to represent the same data), things would be simpler:

proc sql;
create table mytable as
select sum(X * Y) / sum(Y) as wave, mean(X) as ave
from my_dataset
where nmiss(X, Y) = 0
group by Z;
quit;

(untested)

PG

xyxu · Posted 06-05-2021 03:08 PM

I want to compute wave and ave using the maximum number of nonmissing observations. My current code is

proc sql;
create table mytable as
select sum(X * Y) / sum(Y * (X ne .) ) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;

This way wave and ave should be both valid and comparable?

PGStats · Posted 06-05-2021 03:14 PM

@xyxu wrote:

I want to compute wave and ave using the maximum number of nonmissing observations. My current code is
proc sql;
create table mytable as
select sum(X * Y) / sum(Y * (X ne .) ) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;
This way wave and ave should be both valid and comparable?

If there are missing Y values corresponding to non-missing X values then ave will be based on more observations than wave.

PG

PaigeMiller · Posted 06-05-2021 04:28 PM

@xyxu wrote:

I compute weighted average using the following
proc sql;
create table mytable as
select sum(X * Y) / sum(Y) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;
I realize that there are errors in computed weighted average "wave" when some values of variable "X" are missing within a group "Z", whereas "Y" are nonmissing. The denominator would be the sum of all nonmissing values of "Y", but the numerator is only the sum of nonmissing "X". Is there a good practice to solve this issue?

My opinion, a good practice to solve this issue, is that you should compute weighted averages in PROC SUMMARY/PROC MEANS because SAS gets it right in the presence of missing values.

If you use your own SQL code, you could easily get it wrong.

You (or your company or university) are paying for SAS and all of its benefits, including the work that went into getting this correct and applicable when missings are present, the debugging work and the verification that SAS performs. Use that work. Don't do it your self.

--
Paige Miller

Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Re: Compute weighted average using SQL: missing values

Registration is open

SAS Training: Just a Click Away