BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chriscpa
Fluorite | Level 6

Hi all,

 

I am using PROC SQL to capture the mean of past period returns (please see below). The code that I used (adapted from @s_lassen's) works perfectly for monthly data . However, it cannot handle the daily data well due to the much larger size of the dataset (see attached screenshot). In particular, I would like to obtain the mean exret of the same weekday (and different weekday) from weeks t-20 to t-6 . It has been running for more than 20 hours, but has not given the output. Is there any way that I can improve the effeciency of the PROC SQL procedure in this case?

 

Thanks,

Chris

 

proc sql;
  create table return_lag_620
  as select main.*, 
     (select mean(exret)  from return3 as b 
       where gcode=main.gcode 
         and year=main.year
		 and weekday=main.weekday
         and week between main.week - 20 and main.week - 6)                
       as mean_same_weekday,  
	(select mean(exret)  from return3 as b 
       where gcode=main.gcode 
         and year=main.year
		 and weekday ne main.weekday
         and week between main.week - 20 and main.week - 6)                
       as mean_different_weekday 
  from return3 as main
  ;
quit;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

According to SAS query optimization guidelines, subqueries should be replaced by joins when possible. Such as in:

 

data return3;
   input gcode $ weekday week year exret;
   datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425
;

proc sql;
create table return_lag_620 as
select 
    a.gcode, a.year, a.week, a.weekday, a.exret,
    mean(case when a.weekday=b.weekday then b.exret else . end) as mean_same_weekday,
    mean(case when a.weekday=b.weekday then . else b.exret end) as mean_diff_weekday
from
    return3 as a left join 
    return3 as b on 
        a.gcode=b.gcode and a.year=b.year and b.week between a.week-20 and a.week-6
group by a.gcode, a.year, a.week, a.weekday, a.exret;
quit;

image.png

PG

View solution in original post

7 REPLIES 7
chriscpa
Fluorite | Level 6
 
chriscpa
Fluorite | Level 6

Hi Kurt,

 

Thanks for your reply. Please find the sample data below.

 

Kind regards,

Chris

 

 

data input_table;
   input gcode $ weekday $ week $ year $ exret $;
   datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425 ;

 

PGStats
Opal | Level 21

According to SAS query optimization guidelines, subqueries should be replaced by joins when possible. Such as in:

 

data return3;
   input gcode $ weekday week year exret;
   datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425
;

proc sql;
create table return_lag_620 as
select 
    a.gcode, a.year, a.week, a.weekday, a.exret,
    mean(case when a.weekday=b.weekday then b.exret else . end) as mean_same_weekday,
    mean(case when a.weekday=b.weekday then . else b.exret end) as mean_diff_weekday
from
    return3 as a left join 
    return3 as b on 
        a.gcode=b.gcode and a.year=b.year and b.week between a.week-20 and a.week-6
group by a.gcode, a.year, a.week, a.weekday, a.exret;
quit;

image.png

PG
ChrisNZ
Tourmaline | Level 20

subqueries should be replaced by joins when possible

This!

chriscpa
Fluorite | Level 6
Thanks PGStats,
It only took 48 secs to produce the nice output from more than 2 millions rows.
Cheers,
Chris
Ksharp
Super User
Try Hash Table or other skill. SQL is not suited for big data.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1940 views
  • 6 likes
  • 5 in conversation