turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Weighted Mean

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 02:22 AM - last edited on 07-14-2017 10:48 AM by Reeza

Hi there,

i am trying to compute some weighted means of 6 variables acorss time (time series). However, in some time periods one or two or more variables could be misisng. A brute forece weighted means approach would result in 'NO MEAN' being computed for any misisng data, how do i force it to compute a mean and re-weight when data is missing.

Here is some code i have - would i have to use proc means? I have tried to brute force it, and i am sure it making simplications to the code i butchered it, but this doesnt acocunt for 2,3,4,or all 5, variables msising. Is there an easier way to do this?

```
Data B;
Set A;
if missing(E) then SixVar=0.2099*A+ 0.1871*B+ 0.1458*C+ 0.1955*D+ 0.2617*E;
else if missing(E) then SixEFAVar=0.2291*A +0.1894*B + 0.1688*C+ 0.1765*D+ 0.2362*E;
else if missing(C) then SixEFAVar=0.2380*A+ 0.1968*B+ 0.1366*E+ 0.1833*D+ 0.2453*E;
else if missing(D) then SixEFAVar=0.2398*A+0.1983*B+ 0.1768*C+ 0.1377*E+ 0.2473*E;
else if missing(B) then SixEFAVar=0.2432*A+ 0.1792*C+ 0.1396*E+ 0.1873*D+ 0.2507*E;
else if missing(E) then SixEFAVar=0.2558*A+0.2116*B+ 0.1886*C+ 0.1469*E+ 0.1971*D;
else SixEFAVar=0.2024*A+ 0.1674*B+ 0.1492*C+ 0.1163*E+ 0.1559*D+ 0.2087*E;
Run;
```

Accepted Solutions

Solution

07-14-2017
11:42 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:35 AM

There are two ideas being discussed here.

1. The OP asked about a "weighted mean" of VARIABLES, but it seems what she really wants is a linear combination of six variables, where the coefficients change if one or more variables have missing values.

2. There were a few comments about using PROC EXPAND, which computes a moving average of OBSERVATIONS. I see no evidence that this is what the OP wants.

The reason I think that the OP wants (1) is because she mentions principal components.

Mila: The SUM function handles missing values, whereas the '+' operator propagates missing values. That is why someone suggested using SUM.

If you want to standardize the weights, just divide the sum by the sum of the weights, For example

y = sum(A1*x1, A2*x2, ..., A6*x6) / sum(A1, A2, ..., A6);

See the article "Compute a weighted mean in SAS."

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 03:09 AM

Use PROC EXPAND as in this example

http://blogs.sas.com/content/iml/2016/01/27/moving-average-in-sas.html

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 09:50 AM

hi there would u be able to help me out with syntax- i seem to having issues- is the wma function built for time series data in that it is average acorss time?

I apologize if my post was confusing- while my data is a time series, i intend to calculate weighted moving averages in each time period. So i intend to average 6 variables in each time period, here is what i have and it is erorring out:

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;

id DATE;

y [6] A B C D E F

convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));

run;

if i were to add a ";" behing line 3 convert erorrs out

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 03:11 AM

try the sum function

```
Data B;
Set A;
if missing(E) then SixVar=sum(0.2099*A, 0.1871*B, 0.1458*C, 0.1955*D, 0.2617*E);
else if missing(E) then SixEFAVar=sum(0.2291*A ,0.1894*B , 0.1688*C, 0.1765*D, 0.2362*E);
else if missing(C) then SixEFAVar=sum(0.2380*A, 0.1968*B, 0.1366*E, 0.1833*D, 0.2453*E);
else if missing(D) then SixEFAVar=sum(0.2398*A,0.1983*B, 0.1768*C, 0.1377*E, 0.2473*E);
else if missing(B) then SixEFAVar=sum(0.2432*A, 0.1792*C, 0.1396*E, 0.1873*D, 0.2507*E);
else if missing(E) then SixEFAVar=sum(0.2558*A,0.2116*B, 0.1886*C, 0.1469*E, 0.1971*D);
else SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*E, 0.1559*D, 0.2087*E);
Run;
```

Thanks,

Jag

Jag

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 09:56 AM

Would you be able to help me understand why conceptually this would work? tq

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:21 AM

I was able to test the code out and below i am not getting a properly reweighted number:

The good news is that some number is being populated even in rows where 1,2,3 data points of the 6 i misisng - but i am unable to get the correct number having done this manually as a check (see image below)

Code i have used is the following - assuming this does the reweighting for missing data (no need for if statements)

data ccm_crsp_monthly4Combo;

set ccm_crsp_monthly4;

SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F;

SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);

Run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:24 AM

No, the SUM() function uses the same weights no matter what. But if a variable is missing it doesn't count towards the total. Given what you've stated so far that doesn't match your requirements.

Mila wrote:

I was able to test the code out and below i am not getting a properly reweighted number:

The good news is that some number is being populated even in rows where 1,2,3 data points of the 6 i misisng - but i am unable to get the correct number having done this manually as a check (see image below)

Code i have used is the following - assuming this does the reweighting for missing data (no need for if statements)

data ccm_crsp_monthly4Combo;

set ccm_crsp_monthly4;

SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F;

SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);

Run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:29 AM

Any suggestions as to how i can change that SUM to rescale to 100%

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 10:40 AM

Dear all,

I am trying to compute weighted means of 6 variables in each time period (given weights that sum to 1) and then for the process to be repeated with exact same weights across each time.

My data is such that in some time periods one or two or more variables could be missing. A brute force weighted means approach would result in 'NO MEAN' being computed for any missing data. How can I force the code to compute a mean regardless and to re-weight the remaining data when some data is missing.

Here are two thing I have tried but failed

**Data B**;

Set A;

if missing(E) then SixVar=**0.2099***A+ **0.1871***B+ **0.1458***C+ **0.1955***D+ **0.2617***E;

else if missing(E) then SixEFAVar=**0.2291***A +**0.1894***B + **0.1688***C+ **0.1765***D+ **0.2362***E;

else if missing(C) then SixEFAVar=**0.2380***A+ **0.1968***B+ **0.1366***E+ **0.1833***D+ **0.2453***E;

else if missing(D) then SixEFAVar=**0.2398***A+**0.1983***B+ **0.1768***C+ **0.1377***E+ **0.2473***E;

else if missing(B) then SixEFAVar=**0.2432***A+ **0.1792***C+ **0.1396***E+ **0.1873***D+ **0.2507***E;

else if missing(E) then SixEFAVar=**0.2558***A+**0.2116***B+ **0.1886***C+ **0.1469***E+ **0.1971***D;

else SixEFAVar=**0.2024***A+ **0.1674***B+ **0.1492***C+ **0.1163***E+ **0.1559***D+ **0.2087***E;

**Run**;

/* Also experiment with proc expand- but maybe my understanding of this proc is flawed or syntax is incorrect */

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;

id DATE;

y [6] A B C D E F

convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));

run;

Any guidance would be much appreciated!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 10:46 AM

Are your weights stored in a table? If so, please post that as a sample dataset in addition to your data and expected output.

PS. Please do not post the same question multiple times. I'll merge the two threads now.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:08 AM

No they are not, because they are literally 6 exogenously given weights from pca and efa analysis i have just harcoded them in below format

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:20 AM

Mila wrote:

No they are not, because they are literally 6 exogenously given weights from pca and efa analysis i have just harcoded them in below format

So you have 6 different set of weights, one for each case where one variable is missing? What happens if you have two variables that are missing? How are you 'rescaling' the weights for each missing values?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:25 AM

Reeza,

This is the problem i am facing - the issue is i am not sure how best to rescale for missing values- someone suggested doing the following:

data ccm_crsp_monthly4Combo;

set ccm_crsp_monthly4;

SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F);

SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);

Run;

above seems to work in that, if there are missing datapoints it still populated a number- but the number is not consistent with rescaling to 100%

I also tried proc expand as someone had suggested but I am struggling with syntax and that seems to want to do weights acorss dates instead of within the same date (maybe I am not understanding the procedure well enough):

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;

id DATE;

y [6] A B C D E F

convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:29 AM

Mila wrote:

Reeza,

This is the problem i am facing - the issue is i am not sure how best to rescale for missing values- someone suggested doing the following:

That is a problem. You have to know what you wnat to do to be able to code it

I don't have time to code it now, but I suggest loading your weights into an array. Then calculate the number missing and weights you need, and rescale them as needed for each row. You can use an array to hold the new weights as well. Then do the multiplication.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-14-2017 11:31 AM

I see so id have to do it for each row- there isnt a precanned function or procedure for it? Thanks