I compute weighted average using the following
proc sql;
create table mytable as
select sum(X * Y) / sum(Y) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;I realize that there are errors in computed weighted average "wave" when some values of variable "X" are missing within a group "Z", whereas "Y" are nonmissing. The denominator would be the sum of all nonmissing values of "Y", but the numerator is only the sum of nonmissing "X". Is there a good practice to solve this issue?
proc sql;
create table mytable as
select sum(X * Y) / sum(case when X is missing then . else Y end) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;
(untested)
You might want to modify the expression for ave as well.
Now, if you wanted wave and ave to be consistent (i.e. to represent the same data), things would be simpler:
proc sql;
create table mytable as
select sum(X * Y) / sum(Y) as wave, mean(X) as ave
from my_dataset
where nmiss(X, Y) = 0
group by Z;
quit;
(untested)
I want to compute wave and ave using the maximum number of nonmissing observations. My current code is
proc sql;
create table mytable as
select sum(X * Y) / sum(Y * (X ne .) ) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;This way wave and ave should be both valid and comparable?
@xyxu wrote:
I want to compute wave and ave using the maximum number of nonmissing observations. My current code is
proc sql; create table mytable as select sum(X * Y) / sum(Y * (X ne .) ) as wave, mean(X) as ave from my_dataset group by Z; quit;This way wave and ave should be both valid and comparable?
If there are missing Y values corresponding to non-missing X values then ave will be based on more observations than wave.
@xyxu wrote:
I compute weighted average using the following
proc sql; create table mytable as select sum(X * Y) / sum(Y) as wave, mean(X) as ave from my_dataset group by Z; quit;I realize that there are errors in computed weighted average "wave" when some values of variable "X" are missing within a group "Z", whereas "Y" are nonmissing. The denominator would be the sum of all nonmissing values of "Y", but the numerator is only the sum of nonmissing "X". Is there a good practice to solve this issue?
My opinion, a good practice to solve this issue, is that you should compute weighted averages in PROC SUMMARY/PROC MEANS because SAS gets it right in the presence of missing values.
If you use your own SQL code, you could easily get it wrong.
You (or your company or university) are paying for SAS and all of its benefits, including the work that went into getting this correct and applicable when missings are present, the debugging work and the verification that SAS performs. Use that work. Don't do it your self.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.