BookmarkSubscribeRSS Feed
Harsh_Vira
Calcite | Level 5

Hello All,

 

I am working on SAS version: 9.04.01M4P110916 which is hosted on RHEL 64bit. 

 

I want to optimize a Proc SQL query.

 

The source data is in Hadoop and client has only SAS ACCESS FOR HADOOP connector.
The records in source table are 101730000.

 

Below is the SAS query:

PROC SQL;
RESET inobs=max outobs=max noflow nofeedback noprompt nonumber;
CREATE TABLE work.Test1 AS

select distinct  From libname.table
Where column8 in ( value1,value2,value3,value4,value5,value6,value7,value8,value9,value10,value11,value12,value13,value14,value15,
value16,value17);
QUIT;

 

The current query execution time: 06mins

 

Please suggest is there anyway i can improve the efficiency (execution time) of above step.

2 REPLIES 2
LinusH
Tourmaline | Level 20

Your query isn't your actual one I guess, so for instance, how does your IN clauase actally look like?

 

Second - try your query direct in Hive.

 

If your query goes faster there, check how SAS transforms your step into HiveQL, by adding:

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;

If you experience the same performance, turn to a Hive DBA to help you organize your table/rewrite your query.

 

Generally speaking, 6 minutes is not a long time if you are in a data science environment. If you need faster response time, consider other storage/execution options, like Impala.

Data never sleeps
JBailey
Barite | Level 11

Hi @Harsh_Vira 

 

@LinusH provides great advice regarding the OPTIONS statement. It is very important to know the query that SAS is sending to Hive.

On the Hadoop side: it is important to understand how the data is arranged. Partitioning the data may help if Hive/MapReduce can use partition elimination. The Hadoop admins can help with this. It is also a good idea to try the query in HUE and see how it performs without bringing the result set into SAS.

 

Given the number of rows in the file, I think 6 minutes may not be bad. 

 

Best wishes,

Jeff

 

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1387 views
  • 2 likes
  • 3 in conversation