BookmarkSubscribeRSS Feed
Harsh_Vira
Calcite | Level 5

Hello All,

 

I am working on SAS version: 9.04.01M4P110916 which is hosted on RHEL 64bit. 

 

I want to optimize a Proc SQL query.

 

The source data is in Hadoop and client has only SAS ACCESS FOR HADOOP connector.
The records in source table are 101730000.

 

Below is the SAS query:

PROC SQL;
RESET inobs=max outobs=max noflow nofeedback noprompt nonumber;
CREATE TABLE work.Test1 AS

select distinct  From libname.table
Where column8 in ( value1,value2,value3,value4,value5,value6,value7,value8,value9,value10,value11,value12,value13,value14,value15,
value16,value17);
QUIT;

 

The current query execution time: 06mins

 

Please suggest is there anyway i can improve the efficiency (execution time) of above step.

2 REPLIES 2
LinusH
Tourmaline | Level 20

Your query isn't your actual one I guess, so for instance, how does your IN clauase actally look like?

 

Second - try your query direct in Hive.

 

If your query goes faster there, check how SAS transforms your step into HiveQL, by adding:

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;

If you experience the same performance, turn to a Hive DBA to help you organize your table/rewrite your query.

 

Generally speaking, 6 minutes is not a long time if you are in a data science environment. If you need faster response time, consider other storage/execution options, like Impala.

Data never sleeps
JBailey
Barite | Level 11

Hi @Harsh_Vira 

 

@LinusH provides great advice regarding the OPTIONS statement. It is very important to know the query that SAS is sending to Hive.

On the Hadoop side: it is important to understand how the data is arranged. Partitioning the data may help if Hive/MapReduce can use partition elimination. The Hadoop admins can help with this. It is also a good idea to try the query in HUE and see how it performs without bringing the result set into SAS.

 

Given the number of rows in the file, I think 6 minutes may not be bad. 

 

Best wishes,

Jeff

 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1143 views
  • 2 likes
  • 3 in conversation