Hello, I am trying to pull two fields from a partitioned HIVE ORC table. It is partitioned on month_end_date. Each month of data is about 5 million records. SAS batch is SAS (r) Proprietary Software 9.4 (TS1M3) . Apache Hive (version 1.2.1000.2.5.3.0-37). Each field has the format double. options compress=yes macrogen symbolgen mlogic sastrace=',,,ds' sastraceloc=saslog nostsuffix sqlgeneration=dbms dbidirectexec sql_ip_trace=(note, source) msglevel=i source2 source2 mprint MCOMPILENOTE=all ; %let URI="jdbc:=..."; libname lib "xxx"; proc sql; connect to hadoop(server="lnbradpp06" schema=xxxxxxx_xxxx uri=&uri.); CREATE TABLE lib.enrollment as select * from connection to hadoop ( SELECT distinct id , count FROM xxxxxx_xxxx.history WHERE MONTH_END_DATE = "2018-06-30" ); DISCONNECT FROM HADOOP; quit; I don't know why this would take so long (half an hour), or if my options (accumulated from various SAS/Hadoop examples) is a problem. Has anyone had this issue with SAS at their company? The solution I am getting is to switch to R.
... View more