BookmarkSubscribeRSS Feed
Harsh_Vira
Calcite | Level 5

Hello All,

 

I am working on SAS version: 9.04.01M4P110916 which is hosted on RHEL 64bit. 

 

I want to optimize a Proc SQL query.

 

The source data is in Hadoop and client has only SAS ACCESS FOR HADOOP connector.
The records in source table are 101730000.

 

Below is the SAS query:

PROC SQL;
RESET inobs=max outobs=max noflow nofeedback noprompt nonumber;
CREATE TABLE work.Test1 AS

select distinct  From libname.table
Where column8 in ( value1,value2,value3,value4,value5,value6,value7,value8,value9,value10,value11,value12,value13,value14,value15,
value16,value17);
QUIT;

 

The current query execution time: 06mins

 

Please suggest is there anyway i can improve the efficiency (execution time) of above step.

2 REPLIES 2
LinusH
Tourmaline | Level 20

Your query isn't your actual one I guess, so for instance, how does your IN clauase actally look like?

 

Second - try your query direct in Hive.

 

If your query goes faster there, check how SAS transforms your step into HiveQL, by adding:

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;

If you experience the same performance, turn to a Hive DBA to help you organize your table/rewrite your query.

 

Generally speaking, 6 minutes is not a long time if you are in a data science environment. If you need faster response time, consider other storage/execution options, like Impala.

Data never sleeps
JBailey
Barite | Level 11

Hi @Harsh_Vira 

 

@LinusH provides great advice regarding the OPTIONS statement. It is very important to know the query that SAS is sending to Hive.

On the Hadoop side: it is important to understand how the data is arranged. Partitioning the data may help if Hive/MapReduce can use partition elimination. The Hadoop admins can help with this. It is also a good idea to try the query in HUE and see how it performs without bringing the result set into SAS.

 

Given the number of rows in the file, I think 6 minutes may not be bad. 

 

Best wishes,

Jeff

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 593 views
  • 2 likes
  • 3 in conversation