Hi everybody,
I'm working with SAS 9.4 connecting to Hadoop/Cloudera with SAS ACCESS TO. When i try a query with Pass-through, it resolved fast in Hive Database, but the table is creating too slow in SAS. I think there are opportunities to improve the I/O perhaps.
I want some help to improve the I/O with huge table (>20 millions rows) on SAS 9.4, i have 3 compute server in SAS Grid with 8 cpu core and 64GB ram each one.
I know about options bufsize, bufno and blksize, but i dont have expertise to combine and get an efficient parameters to get better I/O.
How slow is too slow? I suggest you post the SAS logs of both a fast Passthrough query and a slow SAS version of the same query. Without evidence we would just be guessing what is happening.
I concur with what @SASKiwi wrote.
Just one thought though: What potentially takes up the time is moving the data to the SAS side and writing the SAS table. Make sure that you don't have variables of type string on the Hadoop side that get mapped to a SAS CHAR(32767). If that happens cast the string to some VARCHAR() with an appropriate length on the Hive side via explicit passthrough SQL.
Issues When Converting Data from Hive to SAS
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.