BookmarkSubscribeRSS Feed
SASUserRocks
Calcite | Level 5

Hello friends,

 

I am facing issue with optimizing this long running proc sql query. I compute daily impressions and clicks and aggregate records along with leads information. This query takes very long time. Please suggest some way forward. Impressions and clicks table also contain only 1 day data.

 

proc sql;
create table libshare.Total_Click_Impressions_Lead as
select distinct A.date,A.Product, count( A.ImpressionId) as Impressions,
count( B.clickid) as Clicks ,
count( distinct row_id||x_adcb_prod_group||x_adcb_prod_code||utm_medium) as Leads ,
startdate as start_date ,enddate as end_date 
from impressions A
left join clicks B on A.date = B.date and A.Product = B.ProductLabel
left join (select * from libshare.leads where x_adcb_last_lead_action_date='09Aug2021'd) C
on intnx('day',A.date,0,'B') = intnx('day',C.x_adcb_last_lead_action_date,0,'B')
and A.Product = C.adcb_utm_camp
 left join(select min(date) format date11. as startdate , max(date) format date11. as enddate ,
product from libshare.Total_Click_Impressions_Lead group by product) D on A.Product = D.Product
group by A.date,A.Product ;

9 REPLIES 9
SASUserRocks
Calcite | Level 5

slight change in query. :-

 

proc sql;
create table libshare.Total_Click_Impressions_Lead1 as
select distinct A.date,A.Product, count( distinct A.ImpressionId) as Impressions,
count( distinct B.clickid) as Clicks ,
count( distinct row_id||x_adcb_prod_group||x_adcb_prod_code||utm_medium) as Leads  ,
startdate as start_date ,enddate as end_date 
from impressions A
left join clicks B on A.date = B.date and A.Product = B.ProductLabel
left join (select * from libshare.leads where x_adcb_last_lead_action_date='09Aug2021'd) C
on intnx('day',A.date,0,'B') = intnx('day',C.x_adcb_last_lead_action_date,0,'B')
and A.Product = C.adcb_utm_camp
 left join(select min(date) format date11. as startdate , max(date) format date11. as enddate ,
product from libSHARE.Total_Click_Impressions_Lead group by product) D on A.Product = D.Product
gr

maguiremq
SAS Super FREQ

First, I highly recommend you format your code. Second, you should use the `Insert SAS Code` in the body header of the post to paste your code. I went ahead and formatted your code (your first post, not your second) in my own way which may not be preferable to you, but it makes it more readable.

proc sql;
	create table 	libshare.Total_Click_Impressions_Lead as
		select 		
					distinct A.date,
					A.Product,
					count( A.ImpressionId) as Impressions,
					count( B.clickid) as Clicks,
					count(distinct row_id||x_adcb_prod_group||x_adcb_prod_code||utm_medium) as Leads,
					startdate as start_date ,enddate as end_date 
		from 
					impressions A
						left join 
					clicks B 
							on 	A.date = B.date and 
								A.Product = B.ProductLabel
						left join 	(
										select 
													* 
										from 
													libshare.leads
										where 
													x_adcb_last_lead_action_date = '09Aug2021'd
									) C
							on 	intnx('day',A.date,0,'B') = intnx('day',C.x_adcb_last_lead_action_date,0,'B') and
								A.Product = C.adcb_utm_camp
 						left join	(
 										select 
 													min(date) format date11. as startdate,
 													max(date) format date11. as enddate,
													product
										from 
													libshare.Total_Click_Impressions_Lead
										group by
													product
									) D 
							on A.Product = D.Product
		group by 	A.date, A.Product;
quit;

Most importantly, I would request that we see your log and any warnings/messages that you're receiving. Since we don't have any data to reproduce your example, it makes it a bit more difficult.

 

I would presume that all the joins that you have in here are slowing the program down, particularly the 'SELECT *" that you have. But again, I don't have your data, so I can only run off assumptions.

 

It might help to break the query down into smaller parts to see where your issues are occurring.

SASUserRocks
Calcite | Level 5

Thanks a lot for your response.

 

Here Impressions contains 60781 records and clicks contain 2058. and leads contain around 10. Its taking very long time that i force stop every time, commenting some codes , i think the issue started to come for certain products where impressions came very high this month causing this slowness. Do you require sample data here.

also i think joining with leads causes more slowness as join with impressions and clicks was giving results within 30 mins..

 

 

SASUserRocks
Calcite | Level 5

changed leads block to below and evaluating the execution.

 

left join (
select
row_id,prod_group,x_adcb_prod_code,utm_medium
from
libshare.leads
where
lead_action_date = '09Aug2021'd
) C

ballardw
Super User

One thing you have that is wasting clock cycles:

on intnx('day',A.date,0,'B') = intnx('day',C.x_adcb_last_lead_action_date,0,'B')

INTNX 0 for 'day' if the value is a date does nothing, and the "B" is also practically useless. There is no "beginning" for a DAY value. If you have a DATETIME and are using DTDAY as the interval then you get the time portion to reflect the beginning of the datetime for that date part. With date and 0 though just wasting time calling the function as the result is the same.

In the example below note that all the Y variables have the same value as X.

data example;
   x='01JUL2021'd;
   y1=intnx('day',x,0);
   y2=intnx('day',x,0,'B');
   y3=intnx('day',x,0,'E');
   y4=intnx('day',x,0,'S');
   format x y: date9.;
   put _all_;
run;

You code may run a little bit faster using

on A.date = C.x_adcb_last_lead_action_date
SASUserRocks
Calcite | Level 5
A.date and C.lead action date are in different formats, is this okay to equating like this.
ballardw
Super User

@SASUserRocks wrote:
A.date and C.lead action date are in different formats, is this okay to equating like this.

FORMAT has absolutely no impact on comparing values.

You are just wasting clock cycles.

You asked about optimization and "running long hours". Small thing things like these unneeded function calls add to longer running times.

SASUserRocks
Calcite | Level 5

Fine, I just took care of this and added extra filter for D block, query still running... 

 

proc sql;
create table libshare.Total_Click_Impressions_Lead3 as
select
distinct A.date,
A.Product,
count( A.ImpressionId) as Impressions,
count( B.clickid) as Clicks,
count(distinct row_id||x_adcb_prod_group||adcb_utm_camp||utm_medium) as Leads,
startdate as start_date ,enddate as end_date
from
impressions A
left join
clicks B
on A.date = B.date and
A.Product = B.ProductLabel
left join (
select
row_id,x_adcb_prod_group,adcb_utm_camp,utm_medium,x_adcb_last_lead_action_date
from
libshare.LEADS
where
x_adcb_last_lead_action_date = '09Aug2021'd
) C
on A.date = C.x_adcb_last_lead_action_date and
A.Product = C.adcb_utm_camp
left join (
select
min(date) format date11. as startdate,
max(date) format date11. as enddate,
product
from
libshare.Total_Click_Impressions_Lead
group by
product having max(date) >='09Aug2021'd
) D
on A.Product = D.Product
group by A.date, A.Product;
quit;

SASUserRocks
Calcite | Level 5
proc sql;
	create table 	libshare.Total_Click_Impressions_Lead3 as
		select 		
					distinct A.date,
					A.Product,
					count( A.ImpressionId) as Impressions,
					count( B.clickid) as Clicks,
					count(distinct row_id||x_adcb_prod_group||adcb_utm_camp||utm_medium) as Leads,
					startdate as start_date ,enddate as end_date 
		from 
					impressions A
						left join 
					clicks B 
							on 	A.date = B.date and 
								A.Product = B.ProductLabel
						left join 	(
										select 
												 row_id,x_adcb_prod_group,adcb_utm_camp,utm_medium,x_adcb_last_lead_action_date
										from 
													libshare.LEADS
										where 
													x_adcb_last_lead_action_date = '09Aug2021'd
									) C
							on 	A.date = C.x_adcb_last_lead_action_date and
								A.Product = C.adcb_utm_camp
 						left join	(
										select 
 													min(date) format date11. as startdate,
 													max(date) format date11. as enddate,
													product
										from 
													libshare.Total_Click_Impressions_Lead
													group by
													product having max(date) >='09Aug2021'd
									) D 
							on A.Product = D.Product
		group by 	A.date, A.Product;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1474 views
  • 1 like
  • 3 in conversation