Hello All,
I am working on SAS version: 9.04.01M4P110916 which is hosted on RHEL 64bit.
I want to optimize a Proc SQL query.
The source data is in Hadoop and client has only SAS ACCESS FOR HADOOP connector.
The records in source table are 101730000.
Below is the SAS query:
PROC SQL;
RESET inobs=max outobs=max noflow nofeedback noprompt nonumber;
CREATE TABLE work.Test1 AS
select distinct From libname.table
Where column8 in ( value1,value2,value3,value4,value5,value6,value7,value8,value9,value10,value11,value12,value13,value14,value15,
value16,value17);
QUIT;
The current query execution time: 06mins
Please suggest is there anyway i can improve the efficiency (execution time) of above step.
Your query isn't your actual one I guess, so for instance, how does your IN clauase actally look like?
Second - try your query direct in Hive.
If your query goes faster there, check how SAS transforms your step into HiveQL, by adding:
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;
If you experience the same performance, turn to a Hive DBA to help you organize your table/rewrite your query.
Generally speaking, 6 minutes is not a long time if you are in a data science environment. If you need faster response time, consider other storage/execution options, like Impala.
Hi @Harsh_Vira
@LinusH provides great advice regarding the OPTIONS statement. It is very important to know the query that SAS is sending to Hive.
On the Hadoop side: it is important to understand how the data is arranged. Partitioning the data may help if Hive/MapReduce can use partition elimination. The Hadoop admins can help with this. It is also a good idea to try the query in HUE and see how it performs without bringing the result set into SAS.
Given the number of rows in the file, I think 6 minutes may not be bad.
Best wishes,
Jeff
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.