How to calculate weighted average in rolling windows

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 130
Accepted Solution

How to calculate weighted average in rolling windows

I will upload sample data latter when I am allowed to do so, currently the system doesn't allow to upload any type of files.)

The sample data is sorted by id, year, month and day.

yr_qrt represents year and quarter.

yr_mth represents year and month.

x_daily is daily data.

y_qrtly is quarterly data, so it does not change within a quarter (its value in the real data is random number).

What I try to do is as showed in the table named "result":

  1. For each id, calculate the mean and standard deviation of x_daily in 3-month window (outputs denoted as x_mean, x_std).
  2. Rolling one month forward, then calculate the mean and standard deviation of x_daily in another 3-month window.
  3. Each time rolling one month forward until the end.

For example,

the first line in the 'result' calculates the mean and standard deviation for x_daily of 1993 Jan, Feb, and Mar;

the second line in the 'result' rolling one month forward, so it calculates the mean and standard deviation for x_daily of 1993 Feb, Mar and Apr.

so on and so forth.

The tricky part is for y_qrtly_weighted.

Y_qrtly is quarterly data, so it is different for different quarters.

Each time rolling one month forward, y_qrtly will include values from two different quarters, as showed in image "3-month window", so I need to calculate weighted average (the result is denoted as "y_qrtly_weighted"), eg., in line 2 of the 'result', y_qrtly_weighted = =(2/3)*21+(1/3)*22 (because two months are from quarter 1, and one month from quarter 2).

**if I use 6-month window, and each time rolling one month forward, then y_qrtly will include values from three different quarters, as showed in image "6-month window".

Thank you as always!


Results
idyr_mthx_meanx_stdy_qrtly_weightedNOTE for variable 'y_qrtly_weighted': including y_qrtly from 3 months
C11993.010.5862720.222748211993.01, 1993.02, 1993.03all from 1993 quarter 1, so y_qrtly_weighted =(3/3)*21+(0/3)*22
C11993.020.4674270.27275721.3333331993.02, 1993.03, 1993.041993.02 and 1993.03 from 1993 q1, while 1993.04 from 1993 q2.  =(2/3)*21+(1/3)*22
C11993.030.4649200.25326921.6666671993.03, 1993.04, 1993.051993.03 from 1993 q1, while 1993.04 and 1993.05 from 1993 q2.  =(1/3)*21+(2/3)*22
C11993.040.4269830.176654221993.04, 1993.05, 1993.06all from 1993 quarter 2. =(3/3)*22+(0/3)*23
C11993.050.4268600.20785222.3333331993.05, 1993.06, 1993.071993.05 and 1993.06 from 1993 q2, while 1993.07 from 1993 q3.  =(2/3)*22+(1/3)*23
C11993.060.4276720.26634422.6666671993.06, 1993.07, 1993.081993.06 from 1993 q2, while 1993.07 and 1993.08 from 1993 q3.  =(1/3)*22+(2/3)*23
C11993.070.4510260.279004231993.07, 1993.08, 1993.09all from 1993 quarter 3. =(3/3)*23+(0/3)*24
C11993.080.5212820.21995923.333333331993.08, 1993.09, 1993.10
C11993.090.6047530.20195623.666666671993.09, 1993.10, 1993.11
C11993.100.5055920.279543241993.10, 1993.11, 1993.12
C11993.110.4136570.32511724.333333331993.11, 1993.12, 1994.01
C11993.120.2370060.17864224.666666671993.12, 1994.01, 1994.02
C11994.010.3187830.202079251994.01, 1994.02, 1994.03
C11994.020.4466950.3145925.333333331994.02, 1994.03, 1994.04
C11994.030.5935230.32385525.666666671994.03, 1994.04, 1994.05
C11994.040.5277370.38738261994.04, 1994.05, 1994.06
C11994.050.5286120.30944926.333333331994.05, 1994.06, 1994.07
C11994.060.4629340.30688926.666666671994.06, 1994.07, 1994.08
C11994.070.5295190.2793271994.07, 1994.08, 1994.09
C11994.080.5629380.28971627.333333331994.08, 1994.09, 1994.10
C11994.090.5200110.24293627.666666671994.09, 1994.10, 1994.11
C11994.100.5281830.206463281994.10, 1994.11, 1994.12
C21993.01
C21993.02
C21993.03
C21993.04
C21993.05


3-month window, each time rolling 1 month forward

YearQuarterMonth3-month window, rolling 1 month
199311@
2@@
3@@@
24@@@
5@@@
6@@@
37@@@
8@@@
9@@@
410@@@
11@@
12@

6-month window, each time rolling 1 month forward

YearQuarterMonth6-month window
199311@
2@@
3@@@
24@@@@
5@@@@@
6@@@@@@
37@@@@@@
8@@@@@
9@@@@
410@@@
11@@
12@

Sample Data
idyearquartermonthdayyr_qrtyr_mthx_dailyy_qrtly
C119931111993.11993.010.699186382221
C119931121993.11993.010.418809544521
C119931131993.11993.010.766921430321
C119931211993.11993.020.762633700421
C119931221993.11993.020.331092233821
C119931231993.11993.020.472057196021
C119931311993.11993.030.318231096221
C119931321993.11993.030.546047018821
C119931331993.11993.030.961472076521
C119932411993.21993.040.244851423722
C119932421993.21993.040.060549691122
C119932431993.21993.040.509908956122
C119932511993.21993.050.597467863622
C119932521993.21993.050.414423629522
C119932531993.21993.050.531327527722
C119932611993.21993.060.433212184322
C119932621993.21993.060.614294484722
C119932631993.21993.060.436814580722
C119933711993.31993.070.163581518323
C119933721993.31993.070.625499835723
C119933731993.31993.070.025114820023
C119933811993.31993.080.224589891123
C119933821993.31993.080.897134014123
C119933831993.31993.080.428807630923
C119933911993.31993.090.528914113823
C119933921993.31993.090.714569119823
C119933931993.31993.090.403845250923
C1199341011993.41993.100.751077418124
C1199341021993.41993.100.381516439624
C1199341031993.41993.100.361081883724
C1199341111993.41993.110.593088000324
C1199341121993.41993.110.934246270024
C1199341131993.41993.110.774434480524
C1199341211993.41993.120.034602863024
C1199341221993.41993.120.340202925924
C1199341231993.41993.120.380080561924
C119941111994.11994.010.539568826125
C119941121994.11994.010.044099092125
C119941131994.11994.010.082589756925
C119941211994.11994.020.094616156025
C119941221994.11994.020.302573060125
C119941231994.11994.020.314721770325
C119941311994.11994.030.473469533825
C119941321994.11994.030.501516888825
C119941331994.11994.030.515894287825
C119942411994.21994.040.941944721326
C119942421994.21994.040.004376345726
C119942431994.21994.040.871145008826
C119942511994.21994.050.275740759526
C119942521994.21994.050.899140367726
C119942531994.21994.050.858478591526
C119942611994.21994.060.611099092926
C119942621994.21994.060.254739099426
C119942631994.21994.060.032972226226
C119943711994.31994.070.475158930727
C119943721994.31994.070.471635181427
C119943731994.31994.070.878543314727
C119943811994.31994.080.203039629227
C119943821994.31994.080.946478395727
C119943831994.31994.080.292736747327
C119943911994.31994.090.812926409627
C119943921994.31994.090.274682079627
C119943931994.31994.090.410469389827
C1199441011994.41994.100.918566778928
C1199441021994.41994.100.496057366828
C1199441031994.41994.100.711484509328
C1199441111994.41994.110.239881888728
C1199441121994.41994.110.322815021828
C1199441131994.41994.110.493211170428
C1199441211994.41994.120.631640484228
C1199441221994.41994.120.399312610128
C1199441231994.41994.120.540675516028
C219931111993.11993.010.140410666431
C219931121993.11993.010.004218107131
C219931131993.11993.010.661278919631
C219931211993.11993.020.986600075731
C219931221993.11993.020.748753952631
C219931231993.11993.020.554971209231
C219931311993.11993.030.237742861431
C219931321993.11993.030.197590820031
C219931331993.11993.030.879083764831
C219932411993.21993.040.592725557332
C219932421993.21993.040.775360519632
C219932431993.21993.040.613098246532
C219932511993.21993.050.237163720432
C219932521993.21993.050.940749551832
C219932531993.21993.050.373490162032
C219932611993.21993.060.495136747432
C219932621993.21993.060.753714895032
C219932631993.21993.060.654817916832
C219933711993.31993.070.150528873533
C219933721993.31993.070.317873882433
C219933731993.31993.070.576810482733
C219933811993.31993.080.816348776633
C219933821993.31993.080.866998744033
C219933831993.31993.080.250220487933
C219933911993.31993.090.600788239033
C219933921993.31993.090.734128994033
C219933931993.31993.090.061041862133
C2199341011993.41993.100.278045898034
C2199341021993.41993.100.088641086234
C2199341031993.41993.100.269269630834
C2199341111993.41993.110.833931494334
C2199341121993.41993.110.803732228534
C2199341131993.41993.110.407617251834
C2199341211993.41993.120.180167867134
C2199341221993.41993.120.401591421934
C2199341231993.41993.120.636811619234
C219941111994.11994.010.999303915735
C219941121994.11994.010.062368783135
C219941131994.11994.010.556475881435
C219941211994.11994.020.515928158235
C219941221994.11994.020.025983841735
C219941231994.11994.020.244261401235
C219941311994.11994.030.350632584135
C219941321994.11994.030.385986837835
C219941331994.11994.030.118107724735
C219942411994.21994.040.274712889736
C219942421994.21994.040.036962887636
C219942431994.21994.040.545917156636
C219942511994.21994.050.596276691536
C219942521994.21994.050.017930838336
C219942531994.21994.050.459782284136
C219942611994.21994.060.581456189236
C219942621994.21994.060.527838635736
C219942631994.21994.060.730804962436
C219943711994.31994.070.258896802537
C219943721994.31994.070.359950672537
C219943731994.31994.070.671786039337
C219943811994.31994.080.701823027937
C219943821994.31994.080.123291467437
C219943831994.31994.080.876729422637
C219943911994.31994.090.323825966337
C219943921994.31994.090.779673875537
C219943931994.31994.090.046542438237
C2199441011994.41994.100.098154139338
C2199441021994.41994.100.963187562038
C2199441031994.41994.100.180379335638
C2199441111994.41994.110.271139278838
C2199441121994.41994.110.936026626138
C2199441131994.41994.110.489233839138
C2199441211994.41994.120.851815613738
C2199441221994.41994.120.930916312138
C2199441231994.41994.120.718714334038

Accepted Solutions
Solution
‎07-03-2015 10:51 PM
Super User
Posts: 10,035

Re: How to calculate weighted average in rolling windows

Code: Program

data have;
infile cards truncover expandtabs;
input id $ year quarter month day yr_qrt  $ yr_mth $ x_daily y_qrtly;
cards;
C1 1993 1 1 1 1993.1 1993.01 0.6991863822 21
C1 1993 1 1 2 1993.1 1993.01 0.4188095445 21
C1 1993 1 1 3 1993.1 1993.01 0.7669214303 21
C1 1993 1 2 1 1993.1 1993.02 0.7626337004 21
C1 1993 1 2 2 1993.1 1993.02 0.3310922338 21
C1 1993 1 2 3 1993.1 1993.02 0.4720571960 21
C1 1993 1 3 1 1993.1 1993.03 0.3182310962 21
C1 1993 1 3 2 1993.1 1993.03 0.5460470188 21
C1 1993 1 3 3 1993.1 1993.03 0.9614720765 21
C1 1993 2 4 1 1993.2 1993.04 0.2448514237 22
C1 1993 2 4 2 1993.2 1993.04 0.0605496911 22
C1 1993 2 4 3 1993.2 1993.04 0.5099089561 22
C1 1993 2 5 1 1993.2 1993.05 0.5974678636 22
C1 1993 2 5 2 1993.2 1993.05 0.4144236295 22
C1 1993 2 5 3 1993.2 1993.05 0.5313275277 22
C1 1993 2 6 1 1993.2 1993.06 0.4332121843 22
C1 1993 2 6 2 1993.2 1993.06 0.6142944847 22
C1 1993 2 6 3 1993.2 1993.06 0.4368145807 22
C1 1993 3 7 1 1993.3 1993.07 0.1635815183 23
C1 1993 3 7 2 1993.3 1993.07 0.6254998357 23
C1 1993 3 7 3 1993.3 1993.07 0.0251148200 23
C1 1993 3 8 1 1993.3 1993.08 0.2245898911 23
C1 1993 3 8 2 1993.3 1993.08 0.8971340141 23
C1 1993 3 8 3 1993.3 1993.08 0.4288076309 23
C1 1993 3 9 1 1993.3 1993.09 0.5289141138 23
C1 1993 3 9 2 1993.3 1993.09 0.7145691198 23
C1 1993 3 9 3 1993.3 1993.09 0.4038452509 23
C1 1993 4 10 1 1993.4 1993.10 0.7510774181 24
C1 1993 4 10 2 1993.4 1993.10 0.3815164396 24
C1 1993 4 10 3 1993.4 1993.10 0.3610818837 24
C1 1993 4 11 1 1993.4 1993.11 0.5930880003 24
C1 1993 4 11 2 1993.4 1993.11 0.9342462700 24
C1 1993 4 11 3 1993.4 1993.11 0.7744344805 24
C1 1993 4 12 1 1993.4 1993.12 0.0346028630 24
C1 1993 4 12 2 1993.4 1993.12 0.3402029259 24
C1 1993 4 12 3 1993.4 1993.12 0.3800805619 24
;
run;

data temp;
set have;
by id year month;
date=mdy(month,day,year);
if first.month;
format date yymmp.;
keep id year month date y_qrtly;
run;

%let window=2;
proc sql;
create table qrtly_weighted as
select *,(select sum(y_qrtly)/count(*) from temp where id=a.id and
   date between a.date and
   intnx('month',a.date,&window)) as y_qrtly_weighted
  
  from temp as a;
quit;

View solution in original post


All Replies
Super User
Posts: 10,035

Re: How to calculate weighted average in rolling windows

Here is an example to calculate rolling mean and std (window=3 months) , you can get start from it .

Code: Program

data have;
infile cards truncover expandtabs;
input id $ year quarter month day yr_qrt  $ yr_mth $ x_daily y_qrtly;
cards;
C1 1993 1 1 1 1993.1 1993.01 0.6991863822 21
C1 1993 1 1 2 1993.1 1993.01 0.4188095445 21
C1 1993 1 1 3 1993.1 1993.01 0.7669214303 21
C1 1993 1 2 1 1993.1 1993.02 0.7626337004 21
C1 1993 1 2 2 1993.1 1993.02 0.3310922338 21
C1 1993 1 2 3 1993.1 1993.02 0.4720571960 21
C1 1993 1 3 1 1993.1 1993.03 0.3182310962 21
C1 1993 1 3 2 1993.1 1993.03 0.5460470188 21
C1 1993 1 3 3 1993.1 1993.03 0.9614720765 21
;
run;
%let window=3;
proc sql;
create table rolling_month as
select *,(select mean(x_daily) from have where id=a.id and
   mdy(month,day,year) between mdy(a.month,a.day,a.year) and
   intnx('month',mdy(a.month,a.day,a.year),&window)) as x_mean,
   (select std(x_daily) from have where id=a.id and
   mdy(month,day,year) between mdy(a.month,a.day,a.year) and
   intnx('month',mdy(a.month,a.day,a.year),&window)) as x_std
  from have as a;
quit;
Frequent Contributor
Posts: 130

Re: How to calculate weighted average in rolling windows

Thank you Keshan!

This will calculate the mean and average for 3-month window, but each time rolling one day forward. Slightly different from my result.

Solution
‎07-03-2015 10:51 PM
Super User
Posts: 10,035

Re: How to calculate weighted average in rolling windows

Code: Program

data have;
infile cards truncover expandtabs;
input id $ year quarter month day yr_qrt  $ yr_mth $ x_daily y_qrtly;
cards;
C1 1993 1 1 1 1993.1 1993.01 0.6991863822 21
C1 1993 1 1 2 1993.1 1993.01 0.4188095445 21
C1 1993 1 1 3 1993.1 1993.01 0.7669214303 21
C1 1993 1 2 1 1993.1 1993.02 0.7626337004 21
C1 1993 1 2 2 1993.1 1993.02 0.3310922338 21
C1 1993 1 2 3 1993.1 1993.02 0.4720571960 21
C1 1993 1 3 1 1993.1 1993.03 0.3182310962 21
C1 1993 1 3 2 1993.1 1993.03 0.5460470188 21
C1 1993 1 3 3 1993.1 1993.03 0.9614720765 21
C1 1993 2 4 1 1993.2 1993.04 0.2448514237 22
C1 1993 2 4 2 1993.2 1993.04 0.0605496911 22
C1 1993 2 4 3 1993.2 1993.04 0.5099089561 22
C1 1993 2 5 1 1993.2 1993.05 0.5974678636 22
C1 1993 2 5 2 1993.2 1993.05 0.4144236295 22
C1 1993 2 5 3 1993.2 1993.05 0.5313275277 22
C1 1993 2 6 1 1993.2 1993.06 0.4332121843 22
C1 1993 2 6 2 1993.2 1993.06 0.6142944847 22
C1 1993 2 6 3 1993.2 1993.06 0.4368145807 22
C1 1993 3 7 1 1993.3 1993.07 0.1635815183 23
C1 1993 3 7 2 1993.3 1993.07 0.6254998357 23
C1 1993 3 7 3 1993.3 1993.07 0.0251148200 23
C1 1993 3 8 1 1993.3 1993.08 0.2245898911 23
C1 1993 3 8 2 1993.3 1993.08 0.8971340141 23
C1 1993 3 8 3 1993.3 1993.08 0.4288076309 23
C1 1993 3 9 1 1993.3 1993.09 0.5289141138 23
C1 1993 3 9 2 1993.3 1993.09 0.7145691198 23
C1 1993 3 9 3 1993.3 1993.09 0.4038452509 23
C1 1993 4 10 1 1993.4 1993.10 0.7510774181 24
C1 1993 4 10 2 1993.4 1993.10 0.3815164396 24
C1 1993 4 10 3 1993.4 1993.10 0.3610818837 24
C1 1993 4 11 1 1993.4 1993.11 0.5930880003 24
C1 1993 4 11 2 1993.4 1993.11 0.9342462700 24
C1 1993 4 11 3 1993.4 1993.11 0.7744344805 24
C1 1993 4 12 1 1993.4 1993.12 0.0346028630 24
C1 1993 4 12 2 1993.4 1993.12 0.3402029259 24
C1 1993 4 12 3 1993.4 1993.12 0.3800805619 24
;
run;

data temp;
set have;
by id year month;
date=mdy(month,day,year);
if first.month;
format date yymmp.;
keep id year month date y_qrtly;
run;

%let window=2;
proc sql;
create table qrtly_weighted as
select *,(select sum(y_qrtly)/count(*) from temp where id=a.id and
   date between a.date and
   intnx('month',a.date,&window)) as y_qrtly_weighted
  
  from temp as a;
quit;
Frequent Contributor
Posts: 130

Re: How to calculate weighted average in rolling windows

Thank you again! this looks very creative to me.

Frequent Contributor
Posts: 130

Re: How to calculate weighted average in rolling windows

I run the code with my data (more than 3 million observations), the proc sql procedure is very slow, it has been running for many hours, still no sign of finishing soon. any way to speed up a little bit? thanks!

Super User
Posts: 10,035

Re: How to calculate weighted average in rolling windows

If it is really big , suggest you to split your table into lots of small sub-table by ID variable .

And for each sub-table :

%let window=2;

proc sql;

create table qrtly_weighted_1 as

select *,(select sum(y_qrtly)/count(*) from temp_1 where

   date between a.date and

   intnx('month',a.date,&window)) as y_qrtly_weighted

  

  from temp as a;

quit;



Other wise, I am going to use Hash Table.



Super User
Posts: 10,035

Re: How to calculate weighted average in rolling windows

Here is Hash Table version of code .

Code: Program

data have;
infile cards truncover expandtabs;
input id $ year quarter month day yr_qrt  $ yr_mth $ x_daily y_qrtly;
cards;
C1 1993 1 1 1 1993.1 1993.01 0.6991863822 21
C1 1993 1 1 2 1993.1 1993.01 0.4188095445 21
C1 1993 1 1 3 1993.1 1993.01 0.7669214303 21
C1 1993 1 2 1 1993.1 1993.02 0.7626337004 21
C1 1993 1 2 2 1993.1 1993.02 0.3310922338 21
C1 1993 1 2 3 1993.1 1993.02 0.4720571960 21
C1 1993 1 3 1 1993.1 1993.03 0.3182310962 21
C1 1993 1 3 2 1993.1 1993.03 0.5460470188 21
C1 1993 1 3 3 1993.1 1993.03 0.9614720765 21
C1 1993 2 4 1 1993.2 1993.04 0.2448514237 22
C1 1993 2 4 2 1993.2 1993.04 0.0605496911 22
C1 1993 2 4 3 1993.2 1993.04 0.5099089561 22
C1 1993 2 5 1 1993.2 1993.05 0.5974678636 22
C1 1993 2 5 2 1993.2 1993.05 0.4144236295 22
C1 1993 2 5 3 1993.2 1993.05 0.5313275277 22
C1 1993 2 6 1 1993.2 1993.06 0.4332121843 22
C1 1993 2 6 2 1993.2 1993.06 0.6142944847 22
C1 1993 2 6 3 1993.2 1993.06 0.4368145807 22
C1 1993 3 7 1 1993.3 1993.07 0.1635815183 23
C1 1993 3 7 2 1993.3 1993.07 0.6254998357 23
C1 1993 3 7 3 1993.3 1993.07 0.0251148200 23
C1 1993 3 8 1 1993.3 1993.08 0.2245898911 23
C1 1993 3 8 2 1993.3 1993.08 0.8971340141 23
C1 1993 3 8 3 1993.3 1993.08 0.4288076309 23
C1 1993 3 9 1 1993.3 1993.09 0.5289141138 23
C1 1993 3 9 2 1993.3 1993.09 0.7145691198 23
C1 1993 3 9 3 1993.3 1993.09 0.4038452509 23
C1 1993 4 10 1 1993.4 1993.10 0.7510774181 24
C1 1993 4 10 2 1993.4 1993.10 0.3815164396 24
C1 1993 4 10 3 1993.4 1993.10 0.3610818837 24
C1 1993 4 11 1 1993.4 1993.11 0.5930880003 24
C1 1993 4 11 2 1993.4 1993.11 0.9342462700 24
C1 1993 4 11 3 1993.4 1993.11 0.7744344805 24
C1 1993 4 12 1 1993.4 1993.12 0.0346028630 24
C1 1993 4 12 2 1993.4 1993.12 0.3402029259 24
C1 1993 4 12 3 1993.4 1993.12 0.3800805619 24
;
run;

data temp;
set have;
by id year month;
date=mdy(month,day,year);
if first.month;
format date yymmp.;
keep id year month date y_qrtly;
run;

%let window=2;


data want;
if _n_ eq 1 then do;
  if 0 then set temp(rename=(y_qrtly=_y_qrtly));
  declare hash ha(dataset:'temp(rename=(y_qrtly=_y_qrtly))',hashexp:20);
  ha.definekey('id','date');
  ha.definedata('_y_qrtly');
  ha.definedone();
end;
set temp;
sum=0;n=0;
do i=date to intnx('month',date,&window);
if ha.find(key:id,key:i)=0 then do;sum+_y_qrtly;n+1;end;
end;
y_qrtly_weighted=divide(sum,n);
drop i sum n _y_qrtly;
run;
Frequent Contributor
Posts: 130

Re: How to calculate weighted average in rolling windows

really appreciate your kindly help!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 423 views
  • 7 likes
  • 2 in conversation