topic Re: Optimize SQL Query in SAS Programming

Optimize SQL Query

Harsh_Vira — Tue, 27 Aug 2019 10:30:19 GMT

Hello All,

I am working on SAS version: 9.04.01M4P110916 which is hosted on RHEL 64bit.

I want to optimize a Proc SQL query.

The source data is in Hadoop and client has only SAS ACCESS FOR HADOOP connector.
The records in source table are 101730000.

Below is the SAS query:

PROC SQL;
RESET inobs=max outobs=max noflow nofeedback noprompt nonumber;
CREATE TABLE work.Test1 AS

select distinct From libname.table
Where column8 in ( value1,value2,value3,value4,value5,value6,value7,value8,value9,value10,value11,value12,value13,value14,value15,
value16,value17);
QUIT;

The current query execution time: 06mins

Please suggest is there anyway i can improve the efficiency (execution time) of above step.

Re: Optimize SQL Query

LinusH — Tue, 27 Aug 2019 11:56:21 GMT

Your query isn't your actual one I guess, so for instance, how does your IN clauase actally look like?

Second - try your query direct in Hive.

If your query goes faster there, check how SAS transforms your step into HiveQL, by adding:

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;

If you experience the same performance, turn to a Hive DBA to help you organize your table/rewrite your query.

Generally speaking, 6 minutes is not a long time if you are in a data science environment. If you need faster response time, consider other storage/execution options, like Impala.

Re: Optimize SQL Query

JBailey — Tue, 27 Aug 2019 12:22:12 GMT

Hi @Harsh_Vira

@LinusH provides great advice regarding the OPTIONS statement. It is very important to know the query that SAS is sending to Hive.

On the Hadoop side: it is important to understand how the data is arranged. Partitioning the data may help if Hive/MapReduce can use partition elimination. The Hadoop admins can help with this. It is also a good idea to try the query in HUE and see how it performs without bringing the result set into SAS.

Given the number of rows in the file, I think 6 minutes may not be bad.

Best wishes,

Jeff