BookmarkSubscribeRSS Feed
CG1
Calcite | Level 5 CG1
Calcite | Level 5

Hi,

I am running one query in oracle that takes 20-25 minutes to execute but when I execute the same in SAS it takes 4 hours.
Can anyone please suggest how to check SAS longops and what are the ways to increase query performance.
Below is Oracle and SAS code

 

Oracle Code:

 

create table total_db nologging as
select * from Sales where CAMP='FPP';

 

select product
,count(unique case when INITCAP(age_ranges)=INITCAP('GenY') then id end) "GenY and Younger[18 thru 35)"
,count(unique case when INITCAP(age_ranges)=INITCAP('GenX') then id end) "GenX [35 thru 50)"
,count(unique case when INITCAP(age_ranges)=INITCAP('Boomers') then id end) "Boomers [50 thru 68)"
,count(unique case when INITCAP(age_ranges)=INITCAP('Older') then id end) "Older [68+)"
,count(unique case when INITCAP(age_ranges)=INITCAP('UNKNOWN') then id end) "Unknown or < 18"
,count(unique case when PRES_OF_CHILDREN='Y' then id end) "Yes"
,count(unique case when PRES_OF_CHILDREN='N' then id end) "No"
from total_db
group by product;

 

SAS Code;

libname MYDB 'L:\REP';
libname mydblib oracle user='ABCD' password='ABCD123' path='prod2' schema='CIRC';

 

proc sql;
create table MYDB.Total_DB as
select * from Sales where CAMP='FPP';
quit;

 

libname MYDB 'L:\REP';
ods CSV file="L:\BIA\brand_age.csv";

proc sql;
select product,
count(distinct case when PROPCASE(age_ranges)=PROPCASE('GenY') then id end) as GenY
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('GenX') then id end) as GenX
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('Boomers') then id end) as Boomers
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('Older') then id end) as Older
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('UNKNOWN') then id end) as Unknown_age
,count(distinct case when PRES_OF_CHILDREN='Y' then id end)as Pres_Yes
,count(distinct case when PRES_OF_CHILDREN='N' then id end) as pres_No
from MYDB.total_db
group by product;
quit;
ods CSV close;

 

Please suggest where is the performance issue and how to monitor SAS query status

9 REPLIES 9
Eva
Quartz | Level 8 Eva
Quartz | Level 8

You can do a pass through to the oracle database:

https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001044413.htm

 

Then sql the statements will be executed directly on oracle.

CG1
Calcite | Level 5 CG1
Calcite | Level 5

Hi,

 

libname MYDB 'L:\REP';
ods CSV file="L:\BIA\brand_age.csv";

proc sql;
select product,
count(distinct case when PROPCASE(age_ranges)=PROPCASE('GenY') then id end) as GenY
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('GenX') then id end) as GenX
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('Boomers') then id end) as Boomers
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('Older') then id end) as Older
,count(distinct case when PROPCASE(age_ranges)=PROPCASE('UNKNOWN') then id end) as Unknown_age
,count(distinct case when PRES_OF_CHILDREN='Y' then id end)as Pres_Yes
,count(distinct case when PRES_OF_CHILDREN='N' then id end) as pres_No
from MYDB.total_db
group by product;
quit;
ods CSV close;

 

 

above is the query that is taking too much time to execute.

 

MYDB.total_db is dataset in SAS.

 

Problem is not table creation when I am fetching data from MYDB.total_db then it takes 4 hours

RW9
Diamond | Level 26 RW9
Diamond | Level 26

It would be better if you posted a sample of your data in the form of a datastep so that we could run it and see.  At a guess I would say that the reason its taking so long is all the sub nesting.  I would suggest separting the logic out into two blocks.  Block one consists of adding flags to the data, based on the logic you supplied, in a datastep.  This should be pretty quick, something like:

data inter;

  set have;

  if upcase(age_ranges)="GENY" then flag1=1;

  if upcase(age_ranges)="GENX" then flag2=1;

...

run;

 

Then run one of the stats methods on the data, for example from the documentation:

ods output nlevels=nlevels;
proc freq data=sashelp.class nlevels;
  tables name age sex;
run;

 

One final thing you might want to try is to do the aggregates separately and then join them back to a distinct list of products.  I.e. In your code each part of the code - the count, the distinct, the propcase() are all done on every row, do them in a subquery and they should only be done on the relevant data.

proc sql;
  create table WANT as
  select  A.PRODUCT
          ,B.GENX
          ,C.GENY
          ...
  from    (select distinct PRODUCT from HAVE) A
  left join (select distinct PRODUCT,count(distinct ID) as GENX from HAVE where upcase(AGE_RANGES)="GENX") B
  on      A.PRODUCT=B.PRODUCT
  left join (select distinct PRODUCT,count(distinct ID) as GENY from HAVE where upcase(AGE_RANGES)="GENY") C
  on      A.PRODUCT=C.PRODUCT
  ...;
quit;

 

LinusH
Tourmaline | Level 20

Why are you copying the data to SAS if that takes longer time? Do the query in Oracle, and direct the output directly to your ODS destination.

Not sure if PROPCASE can be translated automatically to PL SQL, but you should be able to use UPCASE() instead.

 

Data never sleeps
Eva
Quartz | Level 8 Eva
Quartz | Level 8

As MYDB.total_db is a local SAS File I suggest writing that to the oracle DB (create table) and then submit the sql statements with oracle pass through. Then everything is done on the database which is much faster.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

The problem there though, is you then have the overhead of writing the data to the database, and then sending the data back out from the database.  So savings on code execution would be lost in read/write.  Of course if the data is in the database already, then your right it does make sense to o the processing there and export directly to CSV from the database.

Eva
Quartz | Level 8 Eva
Quartz | Level 8

You're right it depends on read/write time. Yet often it is worth trying that's what my experience is. Also depends on size of table. Hopefully the local one is smaller 🙂

LinusH
Tourmaline | Level 20

Ok, there are some things that might be checked.

First, the performance of the SAS server.

options fullstimer msglevel=i;

will give you information on what resources are consumed.

Compare that with your SORTSIZE and MEMSIZE setting for the SAS session, and of course, physical memory available.

 

To understand how PROC SQL is planning the query, add the _method option to the PROC SQL statement.

 

The count(distinct...) construct is quite expensive to execute, so I'm pretty sure that there are things to trim here.

You got an example from @RW already. Here's another example (not SAS specific):

https://www.periscopedata.com/blog/use-subqueries-to-count-distinct-50x-faster.html

 

Data never sleeps
SASKiwi
PROC Star

Remove the ODS CSV if it is a large file. Writing large CSV files via ODS is not recommended as it is very resource hungry. An alternative is PROC EXPORT.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1333 views
  • 1 like
  • 5 in conversation