BookmarkSubscribeRSS Feed
xyxu
Quartz | Level 8

I compute weighted average using the following

proc sql;
create table mytable as
select sum(X * Y) / sum(Y) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;

I realize that there are errors in computed weighted average "wave" when some values of variable "X" are missing within a group "Z", whereas "Y" are nonmissing. The denominator would be the sum of all nonmissing values of "Y", but the numerator is only the sum of nonmissing "X". Is there a good practice to solve this issue? 

5 REPLIES 5
PGStats
Opal | Level 21
proc sql;
create table mytable as
select sum(X * Y) / sum(case when X is missing then . else Y end) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;

(untested)

 

You might want to modify the expression for ave as well.

 

PG
PGStats
Opal | Level 21

Now, if you wanted wave and ave to be consistent (i.e. to represent the same data), things would be simpler:

 

proc sql;
create table mytable as
select sum(X * Y) / sum(Y) as wave, mean(X) as ave
from my_dataset
where nmiss(X, Y) = 0
group by Z;
quit;

(untested)

PG
xyxu
Quartz | Level 8

I want to compute wave and ave using the maximum number of nonmissing observations. My current code is 

proc sql;
create table mytable as
select sum(X * Y) / sum(Y * (X ne .) ) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;

This way wave and ave should be both valid and comparable?

PGStats
Opal | Level 21

@xyxu wrote:

I want to compute wave and ave using the maximum number of nonmissing observations. My current code is 

proc sql;
create table mytable as
select sum(X * Y) / sum(Y * (X ne .) ) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;

This way wave and ave should be both valid and comparable?


If there are missing Y values corresponding to non-missing X values then ave will be based on more observations than wave.

PG
PaigeMiller
Diamond | Level 26

@xyxu wrote:

I compute weighted average using the following

proc sql;
create table mytable as
select sum(X * Y) / sum(Y) as wave, mean(X) as ave
from my_dataset
group by Z;
quit;

I realize that there are errors in computed weighted average "wave" when some values of variable "X" are missing within a group "Z", whereas "Y" are nonmissing. The denominator would be the sum of all nonmissing values of "Y", but the numerator is only the sum of nonmissing "X". Is there a good practice to solve this issue? 


 

My opinion, a good practice to solve this issue, is that you should compute weighted averages in PROC SUMMARY/PROC MEANS because SAS gets it right in the presence of missing values.

 

If you use your own SQL code, you could easily get it wrong.

 

You (or your company or university) are paying for SAS and all of its benefits, including the work that went into getting this correct and applicable when missings are present, the debugging work and the verification that SAS performs. Use that work. Don't do it your self.

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2416 views
  • 4 likes
  • 3 in conversation