About lauracw4

lauracw4 · ‎08-06-2018

When I loaded an numeric ID variable to Hadoop using a simple data step, it went up as DOUBLE, adding a ".0". This causes problems with joins. What format should an ID be in when in SAS so that it will load up to Hadoop as BIGINT? data had.hadoop_table; set lib.sas_table; run;

lauracw4 · ‎08-06-2018

So, our solution turned out to be a non-SAS one. Before doing the select distinct query, we have to execute a set hive statement since it is not currently the default on our system: proc sql; connect to hadoop(server="xxxx" schema=xxxxxxx_xxxx uri=xxxx); execute(set hive.vectorized.execution.enabled=true) by hadoop; create table lib.part as select * from connection to hadoop (SELECT distinct part_id, part_date from xxxxxxx_xxxx.history ); DISCONNECT FROM HADOOP; quit; This made a query that took over 35 minutes take 10 seconds.

lauracw4 · ‎07-23-2018

Well, I found out the select distinct query is slow in Beeline too. So, I guess it is a non-SAS problem. Still I need to find an explanation.

lauracw4 · ‎07-23-2018

Less than 1 minute!

lauracw4 · ‎07-23-2018

I am looking up implicit passthrough now.

lauracw4 · ‎07-23-2018

So, I ran three queries. The first (count(*) and second (select...) came back immediately. The "select distinct " one is the problem, it will likely to half an hour as usual. I am not even sure I need select distinct, this table should be distinct. It is a just-in-case. But why would it be so much longer? proc sql; connect to hadoop(server="xx" schema=xxxxxxx_xxxx uri=&uri.); CREATE TABLE lib.test1 as select * from connection to hadoop ( SELECT count(*) as N FROM xxxxxx_xxxx.history WHERE MONTH_END_DATE = "2018-05-31" ); DISCONNECT FROM HADOOP; quit; proc sql; connect to hadoop(server="xx" schema=xxxxxxx_xxxx uri=&uri.); CREATE TABLE lib.elig_mos_test2 as select * from connection to hadoop ( SELECT id , month_count FROM xxxxxxx_xxxx.history WHERE MONTH_END_DATE = "2018-05-31" ); DISCONNECT FROM HADOOP; quit; proc sql; connect to hadoop(server="xx" schema=xxxxxxx_xxxx uri=&uri.); CREATE TABLE lib.test3 as select * from connection to hadoop ( SELECT distinct id, month_count FROM xxxxxxx_xxxx.history WHERE MONTH_END_DATE = "2018-05-31" ); DISCONNECT FROM HADOOP; quit;

lauracw4 · ‎07-20-2018

Hello, I am trying to pull two fields from a partitioned HIVE ORC table. It is partitioned on month_end_date. Each month of data is about 5 million records. SAS batch is SAS (r) Proprietary Software 9.4 (TS1M3) . Apache Hive (version 1.2.1000.2.5.3.0-37). Each field has the format double. options compress=yes macrogen symbolgen mlogic sastrace=',,,ds' sastraceloc=saslog nostsuffix sqlgeneration=dbms dbidirectexec sql_ip_trace=(note, source) msglevel=i source2 source2 mprint MCOMPILENOTE=all ; %let URI="jdbc:=..."; libname lib "xxx"; proc sql; connect to hadoop(server="lnbradpp06" schema=xxxxxxx_xxxx uri=&uri.); CREATE TABLE lib.enrollment as select * from connection to hadoop ( SELECT distinct id , count FROM xxxxxx_xxxx.history WHERE MONTH_END_DATE = "2018-06-30" ); DISCONNECT FROM HADOOP; quit; I don't know why this would take so long (half an hour), or if my options (accumulated from various SAS/Hadoop examples) is a problem. Has anyone had this issue with SAS at their company? The solution I am getting is to switch to R.

Online Status	Offline
Date Last Visited	‎09-22-2020 02:34 PM

Need to load a variable as BIGINT to Hadoop

Re: query run from SAS batch to a HIVE partitioned table takes half an...

Re: query run from SAS batch to a HIVE partitioned table takes half an...

Re: query run from SAS batch to a HIVE partitioned table takes half an...

Re: query run from SAS batch to a HIVE partitioned table takes half an...

Re: query run from SAS batch to a HIVE partitioned table takes half an...

query run from SAS batch to a HIVE partitioned table takes half an hou...

Need to load a variable as BIGINT to Hadoop

Re: query run from SAS batch to a HIVE partitioned table takes half an...

Re: query run from SAS batch to a HIVE partitioned table takes half an...

Re: query run from SAS batch to a HIVE partitioned table takes half an...

Re: query run from SAS batch to a HIVE partitioned table takes half an...

Re: query run from SAS batch to a HIVE partitioned table takes half an...

query run from SAS batch to a HIVE partitioned table takes half an hou...