BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chriscpa
Fluorite | Level 6

Hi all,

 

I am using PROC SQL to capture the mean of past period returns (please see below). The code that I used (adapted from @s_lassen's) works perfectly for monthly data . However, it cannot handle the daily data well due to the much larger size of the dataset (see attached screenshot). In particular, I would like to obtain the mean exret of the same weekday (and different weekday) from weeks t-20 to t-6 . It has been running for more than 20 hours, but has not given the output. Is there any way that I can improve the effeciency of the PROC SQL procedure in this case?

 

Thanks,

Chris

 

proc sql;
  create table return_lag_620
  as select main.*, 
     (select mean(exret)  from return3 as b 
       where gcode=main.gcode 
         and year=main.year
		 and weekday=main.weekday
         and week between main.week - 20 and main.week - 6)                
       as mean_same_weekday,  
	(select mean(exret)  from return3 as b 
       where gcode=main.gcode 
         and year=main.year
		 and weekday ne main.weekday
         and week between main.week - 20 and main.week - 6)                
       as mean_different_weekday 
  from return3 as main
  ;
quit;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

According to SAS query optimization guidelines, subqueries should be replaced by joins when possible. Such as in:

 

data return3;
   input gcode $ weekday week year exret;
   datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425
;

proc sql;
create table return_lag_620 as
select 
    a.gcode, a.year, a.week, a.weekday, a.exret,
    mean(case when a.weekday=b.weekday then b.exret else . end) as mean_same_weekday,
    mean(case when a.weekday=b.weekday then . else b.exret end) as mean_diff_weekday
from
    return3 as a left join 
    return3 as b on 
        a.gcode=b.gcode and a.year=b.year and b.week between a.week-20 and a.week-6
group by a.gcode, a.year, a.week, a.weekday, a.exret;
quit;

image.png

PG

View solution in original post

7 REPLIES 7
chriscpa
Fluorite | Level 6
 
chriscpa
Fluorite | Level 6

Hi Kurt,

 

Thanks for your reply. Please find the sample data below.

 

Kind regards,

Chris

 

 

data input_table;
   input gcode $ weekday $ week $ year $ exret $;
   datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425 ;

 

PGStats
Opal | Level 21

According to SAS query optimization guidelines, subqueries should be replaced by joins when possible. Such as in:

 

data return3;
   input gcode $ weekday week year exret;
   datalines;
14d1 5 36 2018 0.3057138508
14d1 6 36 2018 0.0115518923
14d1 5 37 2018 -0.018730594
14d1 6 37 2018 -0.072964828
14d1 5 38 2018 -0.120781876
14d1 6 38 2018 0.023588247
14d1 5 39 2018 -0.062521853
14d1 6 39 2018 -0.050833541
14d1 5 40 2018 -0.070952766
14d1 6 40 2018 0.0297087239
14d1 5 41 2018 -0.020763611
14d1 6 41 2018 -0.036628995
14d1 5 42 2018 -0.122045662
14d1 6 42 2018 0.0889606272
14d1 5 43 2018 0.0291037632
14d1 6 43 2018 -0.005378995
14d1 5 44 2018 -0.039277301
14d1 6 44 2018 -0.022922855
14d1 5 45 2018 -0.005378995
14d1 6 45 2018 0.0512247782
14d1 5 46 2018 0.0134889291
14d1 6 46 2018 -0.005378995
14d1 5 47 2018 -0.023897514
14d1 6 47 2018 -0.005378995
14d1 5 48 2018 -0.024609765
14d1 6 48 2018 -0.005378995
14d1 5 49 2018 0.0523133123
14d1 6 49 2018 -0.023560814
14d1 5 50 2018 -0.039277301
14d1 6 50 2018 -0.075554434
14d1 5 51 2018 0.0686950786
14d1 6 51 2018 0.0463451425
;

proc sql;
create table return_lag_620 as
select 
    a.gcode, a.year, a.week, a.weekday, a.exret,
    mean(case when a.weekday=b.weekday then b.exret else . end) as mean_same_weekday,
    mean(case when a.weekday=b.weekday then . else b.exret end) as mean_diff_weekday
from
    return3 as a left join 
    return3 as b on 
        a.gcode=b.gcode and a.year=b.year and b.week between a.week-20 and a.week-6
group by a.gcode, a.year, a.week, a.weekday, a.exret;
quit;

image.png

PG
ChrisNZ
Tourmaline | Level 20

subqueries should be replaced by joins when possible

This!

chriscpa
Fluorite | Level 6
Thanks PGStats,
It only took 48 secs to produce the nice output from more than 2 millions rows.
Cheers,
Chris
Ksharp
Super User
Try Hash Table or other skill. SQL is not suited for big data.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1995 views
  • 6 likes
  • 5 in conversation