Programming the statistical procedures from SAS

Weighted Mean

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Weighted Mean

[ Edited ]

Hi there, 

 

i am trying to compute some weighted means of 6 variables acorss time (time series). However, in some time periods one or two or more variables could be misisng. A brute forece weighted means approach would result in 'NO MEAN' being computed for any misisng data, how do i force it to compute a mean and re-weight when data is missing. 

 

Here is some code i have - would i have to use proc means? I have tried to brute force it, and i am sure it making simplications to the code i butchered it, but this doesnt acocunt for 2,3,4,or all 5, variables msising. Is there an easier way to do this? 

 

 

Data B;
Set A;
 
if missing(E) then SixVar=0.2099*A+ 0.1871*B+ 0.1458*C+ 0.1955*D+ 0.2617*E;
else if missing(E) then SixEFAVar=0.2291*A +0.1894*B + 0.1688*C+ 0.1765*D+ 0.2362*E;
else if missing(C) then SixEFAVar=0.2380*A+ 0.1968*B+ 0.1366*E+ 0.1833*D+ 0.2453*E;
else if missing(D) then SixEFAVar=0.2398*A+0.1983*B+ 0.1768*C+ 0.1377*E+ 0.2473*E;
else if missing(B) then SixEFAVar=0.2432*A+ 0.1792*C+ 0.1396*E+ 0.1873*D+ 0.2507*E;
else if missing(E) then SixEFAVar=0.2558*A+0.2116*B+ 0.1886*C+ 0.1469*E+ 0.1971*D;
else SixEFAVar=0.2024*A+ 0.1674*B+ 0.1492*C+ 0.1163*E+ 0.1559*D+ 0.2087*E;
 
Run;

 


Accepted Solutions
Solution
‎07-14-2017 11:42 AM
SAS Super FREQ
Posts: 3,752

Re: Weighted Mean

There are two ideas being discussed here.

1. The OP asked about a "weighted mean" of VARIABLES, but it seems what she really wants is a linear combination of six variables, where the coefficients change if one or more variables have missing values.

2. There were a few comments about using PROC EXPAND, which computes a moving average of OBSERVATIONS. I see no evidence that this is what the OP wants.

 

The reason I think that the OP wants (1) is because she mentions principal components.

 

Mila: The SUM function handles missing values, whereas the '+' operator propagates missing values. That is why someone suggested using SUM. 

 

If you want to standardize the weights, just divide the sum by the sum of the weights, For example

y = sum(A1*x1, A2*x2, ..., A6*x6) / sum(A1, A2, ..., A6);

See the article "Compute a weighted mean in SAS."

View solution in original post


All Replies
PROC Star
Posts: 742

Re: Weigthed Mean

Occasional Contributor
Posts: 19

Re: Weigthed Mean

hi there would u be able to help me out with syntax- i seem to having issues- is the wma function built for time series data in that it is average acorss time? 

 

I apologize if my post was confusing- while my data is a time series, i intend to calculate weighted moving averages in each time period. So i intend to average 6 variables in each time period, here is what i have and it is erorring out: 

 

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;
id DATE;
y [6] A B C D E F
convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));
run;

 

if i were to add a ";" behing line 3 convert erorrs out

Trusted Advisor
Posts: 1,137

Re: Weigthed Mean

try the sum function

 

Data B;
Set A;
if missing(E) then SixVar=sum(0.2099*A, 0.1871*B, 0.1458*C, 0.1955*D, 0.2617*E);
else if missing(E) then SixEFAVar=sum(0.2291*A ,0.1894*B , 0.1688*C, 0.1765*D, 0.2362*E);
else if missing(C) then SixEFAVar=sum(0.2380*A, 0.1968*B, 0.1366*E, 0.1833*D, 0.2453*E);
else if missing(D) then SixEFAVar=sum(0.2398*A,0.1983*B, 0.1768*C, 0.1377*E, 0.2473*E);
else if missing(B) then SixEFAVar=sum(0.2432*A, 0.1792*C, 0.1396*E, 0.1873*D, 0.2507*E);
else if missing(E) then SixEFAVar=sum(0.2558*A,0.2116*B, 0.1886*C, 0.1469*E, 0.1971*D);
else SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*E, 0.1559*D, 0.2087*E);
Run;
Thanks,
Jag
Occasional Contributor
Posts: 19

Re: Weigthed Mean

Posted in reply to Jagadishkatam
Would you be able to help me understand why conceptually this would work? tq
Occasional Contributor
Posts: 19

Re: Weigthed Mean

Posted in reply to Jagadishkatam

I was able to test the code out and below i am not getting a properly reweighted number: 

 

The good news is that some number is being populated even in rows where 1,2,3 data points of the 6 i misisng - but i am unable to get the correct number having done this manually as a check (see image below)

 

Code i have used is the following - assuming this does the reweighting for missing data (no need for if statements)

 

data ccm_crsp_monthly4Combo;
set ccm_crsp_monthly4;
SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F;
SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);
Run;

 

 


Capture2.PNG
Super User
Posts: 19,789

Re: Weigthed Mean

No, the SUM() function uses the same weights no matter what. But if a variable is missing it doesn't count towards the total. Given what you've stated so far that doesn't match your requirements.

 


Mila wrote:

I was able to test the code out and below i am not getting a properly reweighted number: 

 

The good news is that some number is being populated even in rows where 1,2,3 data points of the 6 i misisng - but i am unable to get the correct number having done this manually as a check (see image below)

 

Code i have used is the following - assuming this does the reweighting for missing data (no need for if statements)

 

data ccm_crsp_monthly4Combo;
set ccm_crsp_monthly4;
SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F;
SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);
Run;

 

 


 

Occasional Contributor
Posts: 19

Re: Weigthed Mean

Any suggestions as to how i can change that SUM to rescale to 100%

Occasional Contributor
Posts: 19

Weighted Averages (WMA) in each time period repeated across time

Dear all,

 

I am trying to compute weighted means of 6 variables in each time period (given weights that sum to 1) and then for the process to be repeated with exact same weights across each time.  

 

My data is such that in some time periods one or two or more variables could be missing. A brute force weighted means approach would result in 'NO MEAN' being computed for any missing data. How can I force the code to compute a mean regardless and to re-weight the remaining data when some data is missing. 

 

Here are two thing I have tried but failed

 

Data B;

Set A;

 

if missing(E) then SixVar=0.2099*A+ 0.1871*B+ 0.1458*C+ 0.1955*D+ 0.2617*E;

else if missing(E) then SixEFAVar=0.2291*A +0.1894*B + 0.1688*C+ 0.1765*D+ 0.2362*E;

else if missing(C) then SixEFAVar=0.2380*A+ 0.1968*B+ 0.1366*E+ 0.1833*D+ 0.2453*E;

else if missing(D) then SixEFAVar=0.2398*A+0.1983*B+ 0.1768*C+ 0.1377*E+ 0.2473*E;

else if missing(B) then SixEFAVar=0.2432*A+ 0.1792*C+ 0.1396*E+ 0.1873*D+ 0.2507*E;

else if missing(E) then SixEFAVar=0.2558*A+0.2116*B+ 0.1886*C+ 0.1469*E+ 0.1971*D;

else SixEFAVar=0.2024*A+ 0.1674*B+ 0.1492*C+ 0.1163*E+ 0.1559*D+ 0.2087*E;

 

Run;

 

/* Also experiment with proc expand- but maybe my understanding of this proc is flawed or syntax is incorrect */

 

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;
id DATE;
y [6] A B C D E F
convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));
run;

 

Any guidance would be much appreciated!

Super User
Posts: 19,789

Re: Weigthed Mean

Are your weights stored in a table? If so, please post that as a sample dataset in addition to your data and expected output. 

 

PS. Please do not post the same question multiple times. I'll merge the two threads now. 

Occasional Contributor
Posts: 19

Re: Weigthed Mean

No they are not, because they are literally 6 exogenously given weights from pca and efa analysis i have just harcoded them in below format
Super User
Posts: 19,789

Re: Weigthed Mean


Mila wrote:
No they are not, because they are literally 6 exogenously given weights from pca and efa analysis i have just harcoded them in below format

So you have 6 different set of weights, one for each case where one variable is missing? What happens if you have two variables that are missing? How are you 'rescaling' the weights for each missing values? 

Occasional Contributor
Posts: 19

Re: Weigthed Mean

Reeza,

 

This is the problem i am facing - the issue is i am not sure how best to rescale for missing values- someone suggested doing the following:

 

data ccm_crsp_monthly4Combo;
set ccm_crsp_monthly4;
SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F);
SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);
Run;

 

above seems to work in that, if there are missing datapoints it still populated a number- but the number is not consistent with rescaling to 100%

 

I also tried proc expand as someone had suggested but I am struggling with syntax and that seems to want to do weights acorss dates instead of within the same date (maybe I am not understanding the procedure well enough):

 

 

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;
id DATE;
y [6] A B C D E F
convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));
run;

Super User
Posts: 19,789

Re: Weigthed Mean


Mila wrote:

Reeza,

 

This is the problem i am facing - the issue is i am not sure how best to rescale for missing values- someone suggested doing the following:

 

 

That is a problem. You have to know what you wnat to do to be able to code it Smiley Happy

I don't have time to code it now, but I suggest loading your weights into an array. Then calculate the number missing and weights you need, and rescale them as needed for each row. You can use an array to hold the new weights as well. Then do the multiplication.

 

 

 

Occasional Contributor
Posts: 19

Re: Weigthed Mean

I see so id have to do it for each row- there isnt a precanned function or procedure for it? Thanks

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 208 views
  • 3 likes
  • 5 in conversation