BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mila
Obsidian | Level 7

Hi there, 

 

i am trying to compute some weighted means of 6 variables acorss time (time series). However, in some time periods one or two or more variables could be misisng. A brute forece weighted means approach would result in 'NO MEAN' being computed for any misisng data, how do i force it to compute a mean and re-weight when data is missing. 

 

Here is some code i have - would i have to use proc means? I have tried to brute force it, and i am sure it making simplications to the code i butchered it, but this doesnt acocunt for 2,3,4,or all 5, variables msising. Is there an easier way to do this? 

 

 

Data B;
Set A;
 
if missing(E) then SixVar=0.2099*A+ 0.1871*B+ 0.1458*C+ 0.1955*D+ 0.2617*E;
else if missing(E) then SixEFAVar=0.2291*A +0.1894*B + 0.1688*C+ 0.1765*D+ 0.2362*E;
else if missing(C) then SixEFAVar=0.2380*A+ 0.1968*B+ 0.1366*E+ 0.1833*D+ 0.2453*E;
else if missing(D) then SixEFAVar=0.2398*A+0.1983*B+ 0.1768*C+ 0.1377*E+ 0.2473*E;
else if missing(B) then SixEFAVar=0.2432*A+ 0.1792*C+ 0.1396*E+ 0.1873*D+ 0.2507*E;
else if missing(E) then SixEFAVar=0.2558*A+0.2116*B+ 0.1886*C+ 0.1469*E+ 0.1971*D;
else SixEFAVar=0.2024*A+ 0.1674*B+ 0.1492*C+ 0.1163*E+ 0.1559*D+ 0.2087*E;
 
Run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

There are two ideas being discussed here.

1. The OP asked about a "weighted mean" of VARIABLES, but it seems what she really wants is a linear combination of six variables, where the coefficients change if one or more variables have missing values.

2. There were a few comments about using PROC EXPAND, which computes a moving average of OBSERVATIONS. I see no evidence that this is what the OP wants.

 

The reason I think that the OP wants (1) is because she mentions principal components.

 

Mila: The SUM function handles missing values, whereas the '+' operator propagates missing values. That is why someone suggested using SUM. 

 

If you want to standardize the weights, just divide the sum by the sum of the weights, For example

y = sum(A1*x1, A2*x2, ..., A6*x6) / sum(A1, A2, ..., A6);

See the article "Compute a weighted mean in SAS."

View solution in original post

17 REPLIES 17
Mila
Obsidian | Level 7

hi there would u be able to help me out with syntax- i seem to having issues- is the wma function built for time series data in that it is average acorss time? 

 

I apologize if my post was confusing- while my data is a time series, i intend to calculate weighted moving averages in each time period. So i intend to average 6 variables in each time period, here is what i have and it is erorring out: 

 

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;
id DATE;
y [6] A B C D E F
convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));
run;

 

if i were to add a ";" behing line 3 convert erorrs out

Jagadishkatam
Amethyst | Level 16

try the sum function

 

Data B;
Set A;
if missing(E) then SixVar=sum(0.2099*A, 0.1871*B, 0.1458*C, 0.1955*D, 0.2617*E);
else if missing(E) then SixEFAVar=sum(0.2291*A ,0.1894*B , 0.1688*C, 0.1765*D, 0.2362*E);
else if missing(C) then SixEFAVar=sum(0.2380*A, 0.1968*B, 0.1366*E, 0.1833*D, 0.2453*E);
else if missing(D) then SixEFAVar=sum(0.2398*A,0.1983*B, 0.1768*C, 0.1377*E, 0.2473*E);
else if missing(B) then SixEFAVar=sum(0.2432*A, 0.1792*C, 0.1396*E, 0.1873*D, 0.2507*E);
else if missing(E) then SixEFAVar=sum(0.2558*A,0.2116*B, 0.1886*C, 0.1469*E, 0.1971*D);
else SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*E, 0.1559*D, 0.2087*E);
Run;
Thanks,
Jag
Mila
Obsidian | Level 7
Would you be able to help me understand why conceptually this would work? tq
Mila
Obsidian | Level 7

I was able to test the code out and below i am not getting a properly reweighted number: 

 

The good news is that some number is being populated even in rows where 1,2,3 data points of the 6 i misisng - but i am unable to get the correct number having done this manually as a check (see image below)

 

Code i have used is the following - assuming this does the reweighting for missing data (no need for if statements)

 

data ccm_crsp_monthly4Combo;
set ccm_crsp_monthly4;
SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F;
SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);
Run;

 

 


Capture2.PNG
Reeza
Super User

No, the SUM() function uses the same weights no matter what. But if a variable is missing it doesn't count towards the total. Given what you've stated so far that doesn't match your requirements.

 


@Mila wrote:

I was able to test the code out and below i am not getting a properly reweighted number: 

 

The good news is that some number is being populated even in rows where 1,2,3 data points of the 6 i misisng - but i am unable to get the correct number having done this manually as a check (see image below)

 

Code i have used is the following - assuming this does the reweighting for missing data (no need for if statements)

 

data ccm_crsp_monthly4Combo;
set ccm_crsp_monthly4;
SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F;
SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);
Run;

 

 


 

Mila
Obsidian | Level 7

Any suggestions as to how i can change that SUM to rescale to 100%

Mila
Obsidian | Level 7

Dear all,

 

I am trying to compute weighted means of 6 variables in each time period (given weights that sum to 1) and then for the process to be repeated with exact same weights across each time.  

 

My data is such that in some time periods one or two or more variables could be missing. A brute force weighted means approach would result in 'NO MEAN' being computed for any missing data. How can I force the code to compute a mean regardless and to re-weight the remaining data when some data is missing. 

 

Here are two thing I have tried but failed

 

Data B;

Set A;

 

if missing(E) then SixVar=0.2099*A+ 0.1871*B+ 0.1458*C+ 0.1955*D+ 0.2617*E;

else if missing(E) then SixEFAVar=0.2291*A +0.1894*B + 0.1688*C+ 0.1765*D+ 0.2362*E;

else if missing(C) then SixEFAVar=0.2380*A+ 0.1968*B+ 0.1366*E+ 0.1833*D+ 0.2453*E;

else if missing(D) then SixEFAVar=0.2398*A+0.1983*B+ 0.1768*C+ 0.1377*E+ 0.2473*E;

else if missing(B) then SixEFAVar=0.2432*A+ 0.1792*C+ 0.1396*E+ 0.1873*D+ 0.2507*E;

else if missing(E) then SixEFAVar=0.2558*A+0.2116*B+ 0.1886*C+ 0.1469*E+ 0.1971*D;

else SixEFAVar=0.2024*A+ 0.1674*B+ 0.1492*C+ 0.1163*E+ 0.1559*D+ 0.2087*E;

 

Run;

 

/* Also experiment with proc expand- but maybe my understanding of this proc is flawed or syntax is incorrect */

 

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;
id DATE;
y [6] A B C D E F
convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));
run;

 

Any guidance would be much appreciated!

Reeza
Super User

Are your weights stored in a table? If so, please post that as a sample dataset in addition to your data and expected output. 

 

PS. Please do not post the same question multiple times. I'll merge the two threads now. 

Mila
Obsidian | Level 7
No they are not, because they are literally 6 exogenously given weights from pca and efa analysis i have just harcoded them in below format
Reeza
Super User

@Mila wrote:
No they are not, because they are literally 6 exogenously given weights from pca and efa analysis i have just harcoded them in below format

So you have 6 different set of weights, one for each case where one variable is missing? What happens if you have two variables that are missing? How are you 'rescaling' the weights for each missing values? 

Mila
Obsidian | Level 7

Reeza,

 

This is the problem i am facing - the issue is i am not sure how best to rescale for missing values- someone suggested doing the following:

 

data ccm_crsp_monthly4Combo;
set ccm_crsp_monthly4;
SixPCAVar=sum(0.1739*A, 0.1723*B, 0.159*C, 0.1467*D, 0.1644*E, 0.1838*F);
SixEFAVar=sum(0.2024*A, 0.1674*B, 0.1492*C, 0.1163*D, 0.1559*E, 0.2087*F);
Run;

 

above seems to work in that, if there are missing datapoints it still populated a number- but the number is not consistent with rescaling to 100%

 

I also tried proc expand as someone had suggested but I am struggling with syntax and that seems to want to do weights acorss dates instead of within the same date (maybe I am not understanding the procedure well enough):

 

 

proc expand data=ccm_crsp_monthly4ComboA out=ccm_crsp_monthly4Combo method=none;
id DATE;
y [6] A B C D E F
convert y = WMA / transout=(movave(0.1739 0.1723 0.159 0.1467 0.164 0.1838));
run;

Reeza
Super User

@Mila wrote:

Reeza,

 

This is the problem i am facing - the issue is i am not sure how best to rescale for missing values- someone suggested doing the following:

 

 

That is a problem. You have to know what you wnat to do to be able to code it 🙂

I don't have time to code it now, but I suggest loading your weights into an array. Then calculate the number missing and weights you need, and rescale them as needed for each row. You can use an array to hold the new weights as well. Then do the multiplication.

 

 

 

Mila
Obsidian | Level 7

I see so id have to do it for each row- there isnt a precanned function or procedure for it? Thanks

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 4539 views
  • 3 likes
  • 5 in conversation