DATA Step, Macro, Functions and more

query performance

Reply
Occasional Contributor CG1
Occasional Contributor
Posts: 13

query performance

Hi,

I am running one query in oracle that takes 20-25 minutes to execute but when I execute the same in SAS it takes 4 hours.
Can anyone please suggest how to check SAS longops and what are the ways to increase query performance.
Below is Oracle and SAS code

 

Oracle Code:

 

create table total_db nologging as
select * from Sales where CAMP='FPP';

 

select product
,count(unique case when INITCAP(age_ranges)=INITCAP('GenY') then id end) "GenY and Younger[18 thru 35)"
,count(unique case when INITCAP(age_ranges)=INITCAP('GenX') then id end) "GenX [35 thru 50)"
,count(unique case when INITCAP(age_ranges)=INITCAP('Boomers') then id end) "Boomers [50 thru 68)"
,count(unique case when INITCAP(age_ranges)=INITCAP('Older') then id end) "Older [68+)"
,count(unique case when INITCAP(age_ranges)=INITCAP('UNKNOWN') then id end) "Unknown or < 18"
,count(unique case when PRES_OF_CHILDREN='Y' then id end) "Yes"
,count(unique case when PRES_OF_CHILDREN='N' then id end) "No"
from total_db
group by product;

 

SAS Code;

libname MYDB 'L:\REP';
libname mydblib oracle user='ABCD' password='ABCD123' path='prod2' schema='CIRC';

 

proc sql;
create table MYDB.Total_DB as
select * from Sales where CAMP='FPP';
quit;

 

libname MYDB 'L:\REP';
ods CSV file="L:\BIA\brand_age.csv";

proc sql;
select product,
count(distinct case when PROPCASE(age_ranges)=PROPCASE('GenY') then id end) as GenY
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('GenX') then id end) as GenX
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('Boomers') then id end) as Boomers
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('Older') then id end) as Older
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('UNKNOWN') then id end) as Unknown_age
,count(distinct case when PRES_OF_CHILDREN='Y' then id end)as Pres_Yes
,count(distinct case when PRES_OF_CHILDREN='N' then id end) as pres_No
from MYDB.total_db
group by product;
quit;
ods CSV close;

 

Please suggest where is the performance issue and how to monitor SAS query status

Regular Contributor
Regular Contributor
Posts: 156

Re: query performance

You can do a pass through to the oracle database:

https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001044413.htm

 

Then sql the statements will be executed directly on oracle.

Occasional Contributor CG1
Occasional Contributor
Posts: 13

Re: query performance

Hi,

 

libname MYDB 'L:\REP';
ods CSV file="L:\BIA\brand_age.csv";

proc sql;
select product,
count(distinct case when PROPCASE(age_ranges)=PROPCASE('GenY') then id end) as GenY
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('GenX') then id end) as GenX
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('Boomers') then id end) as Boomers
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('Older') then id end) as Older
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('UNKNOWN') then id end) as Unknown_age
,count(distinct case when PRES_OF_CHILDREN='Y' then id end)as Pres_Yes
,count(distinct case when PRES_OF_CHILDREN='N' then id end) as pres_No
from MYDB.total_db
group by product;
quit;
ods CSV close;

 

 

above is the query that is taking too much time to execute.

 

MYDB.total_db is dataset in SAS.

 

Problem is not table creation when I am fetching data from MYDB.total_db then it takes 4 hours

Super User
Super User
Posts: 7,942

Re: query performance

It would be better if you posted a sample of your data in the form of a datastep so that we could run it and see.  At a guess I would say that the reason its taking so long is all the sub nesting.  I would suggest separting the logic out into two blocks.  Block one consists of adding flags to the data, based on the logic you supplied, in a datastep.  This should be pretty quick, something like:

data inter;

  set have;

  if upcase(age_ranges)="GENY" then flag1=1;

  if upcase(age_ranges)="GENX" then flag2=1;

...

run;

 

Then run one of the stats methods on the data, for example from the documentation:

ods output nlevels=nlevels;
proc freq data=sashelp.class nlevels;
  tables name age sex;
run;

 

One final thing you might want to try is to do the aggregates separately and then join them back to a distinct list of products.  I.e. In your code each part of the code - the count, the distinct, the propcase() are all done on every row, do them in a subquery and they should only be done on the relevant data.

proc sql;
  create table WANT as
  select  A.PRODUCT
          ,B.GENX
          ,C.GENY
          ...
  from    (select distinct PRODUCT from HAVE) A
  left join (select distinct PRODUCT,count(distinct ID) as GENX from HAVE where upcase(AGE_RANGES)="GENX") B
  on      A.PRODUCT=B.PRODUCT
  left join (select distinct PRODUCT,count(distinct ID) as GENY from HAVE where upcase(AGE_RANGES)="GENY") C
  on      A.PRODUCT=C.PRODUCT
  ...;
quit;

 

Super User
Posts: 5,424

Re: query performance

Why are you copying the data to SAS if that takes longer time? Do the query in Oracle, and direct the output directly to your ODS destination.

Not sure if PROPCASE can be translated automatically to PL SQL, but you should be able to use UPCASE() instead.

 

Data never sleeps
Regular Contributor
Regular Contributor
Posts: 156

Re: query performance

As MYDB.total_db is a local SAS File I suggest writing that to the oracle DB (create table) and then submit the sql statements with oracle pass through. Then everything is done on the database which is much faster.

Super User
Super User
Posts: 7,942

Re: query performance

The problem there though, is you then have the overhead of writing the data to the database, and then sending the data back out from the database.  So savings on code execution would be lost in read/write.  Of course if the data is in the database already, then your right it does make sense to o the processing there and export directly to CSV from the database.

Regular Contributor
Regular Contributor
Posts: 156

Re: query performance

You're right it depends on read/write time. Yet often it is worth trying that's what my experience is. Also depends on size of table. Hopefully the local one is smaller :-)

Super User
Posts: 5,424

Re: query performance

Ok, there are some things that might be checked.

First, the performance of the SAS server.

options fullstimer msglevel=i;

will give you information on what resources are consumed.

Compare that with your SORTSIZE and MEMSIZE setting for the SAS session, and of course, physical memory available.

 

To understand how PROC SQL is planning the query, add the _method option to the PROC SQL statement.

 

The count(distinct...) construct is quite expensive to execute, so I'm pretty sure that there are things to trim here.

You got an example from @RW already. Here's another example (not SAS specific):

https://www.periscopedata.com/blog/use-subqueries-to-count-distinct-50x-faster.html

 

Data never sleeps
Super User
Posts: 3,250

Re: query performance

Remove the ODS CSV if it is a large file. Writing large CSV files via ODS is not recommended as it is very resource hungry. An alternative is PROC EXPORT.

Ask a Question
Discussion stats
  • 9 replies
  • 357 views
  • 1 like
  • 5 in conversation