Hi all,
I am using PROC SQL to capture the mean of past period returns (please see below). The code that I used (adapted from @s_lassen's) works perfectly for monthly data . However, it cannot handle the daily data well due to the much larger size of the dataset (see attached screenshot). In particular, I would like to obtain the mean exret of the same weekday (and different weekday) from weeks t-20 to t-6 . It has been running for more than 20 hours, but has not given the output. Is there any way that I can improve the effeciency of the PROC SQL procedure in this case?
Thanks,
Chris
proc sql;
create table return_lag_620
as select main.*,
(select mean(exret) from return3 as b
where gcode=main.gcode
and year=main.year
and weekday=main.weekday
and week between main.week - 20 and main.week - 6)
as mean_same_weekday,
(select mean(exret) from return3 as b
where gcode=main.gcode
and year=main.year
and weekday ne main.weekday
and week between main.week - 20 and main.week - 6)
as mean_different_weekday
from return3 as main
;
quit;
According to SAS query optimization guidelines, subqueries should be replaced by joins when possible. Such as in:
data return3;
input gcode $ weekday week year exret;
datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425
;
proc sql;
create table return_lag_620 as
select
a.gcode, a.year, a.week, a.weekday, a.exret,
mean(case when a.weekday=b.weekday then b.exret else . end) as mean_same_weekday,
mean(case when a.weekday=b.weekday then . else b.exret end) as mean_diff_weekday
from
return3 as a left join
return3 as b on
a.gcode=b.gcode and a.year=b.year and b.week between a.week-20 and a.week-6
group by a.gcode, a.year, a.week, a.weekday, a.exret;
quit;
Please post example data in usable form (data step with datalines). We can't test code against pictures.
Hi Kurt,
Thanks for your reply. Please find the sample data below.
Kind regards,
Chris
data input_table; input gcode $ weekday $ week $ year $ exret $; datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425 ;
According to SAS query optimization guidelines, subqueries should be replaced by joins when possible. Such as in:
data return3;
input gcode $ weekday week year exret;
datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425
;
proc sql;
create table return_lag_620 as
select
a.gcode, a.year, a.week, a.weekday, a.exret,
mean(case when a.weekday=b.weekday then b.exret else . end) as mean_same_weekday,
mean(case when a.weekday=b.weekday then . else b.exret end) as mean_diff_weekday
from
return3 as a left join
return3 as b on
a.gcode=b.gcode and a.year=b.year and b.week between a.week-20 and a.week-6
group by a.gcode, a.year, a.week, a.weekday, a.exret;
quit;
subqueries should be replaced by joins when possible
This!
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.