Hello All,
I am working on SAS version: 9.04.01M4P110916 which is hosted on RHEL 64bit.
I want to optimize a Proc SQL query.
The source data is in Hadoop and client has only SAS ACCESS FOR HADOOP connector.
The records in source table are 101730000.
Below is the SAS query:
PROC SQL;
RESET inobs=max outobs=max noflow nofeedback noprompt nonumber;
CREATE TABLE work.Test1 AS
select distinct From libname.table
Where column8 in ( value1,value2,value3,value4,value5,value6,value7,value8,value9,value10,value11,value12,value13,value14,value15,
value16,value17);
QUIT;
The current query execution time: 06mins
Please suggest is there anyway i can improve the efficiency (execution time) of above step.
Your query isn't your actual one I guess, so for instance, how does your IN clauase actally look like?
Second - try your query direct in Hive.
If your query goes faster there, check how SAS transforms your step into HiveQL, by adding:
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;
If you experience the same performance, turn to a Hive DBA to help you organize your table/rewrite your query.
Generally speaking, 6 minutes is not a long time if you are in a data science environment. If you need faster response time, consider other storage/execution options, like Impala.
Hi @Harsh_Vira
@LinusH provides great advice regarding the OPTIONS statement. It is very important to know the query that SAS is sending to Hive.
On the Hadoop side: it is important to understand how the data is arranged. Partitioning the data may help if Hive/MapReduce can use partition elimination. The Hadoop admins can help with this. It is also a good idea to try the query in HUE and see how it performs without bringing the result set into SAS.
Given the number of rows in the file, I think 6 minutes may not be bad.
Best wishes,
Jeff
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: