SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

SAS Pass through connection to hadoop

Reply
Occasional Contributor
Posts: 6

SAS Pass through connection to hadoop

Hi ,

 

Is there any efficient way to have SAS pass through connection with Hadoop while importing huge datasets. The data has nearly 2M rows and 8K columns. Thanks !

Super User
Posts: 5,426

Re: SAS Pass through connection to hadoop

Posted in reply to krishnaram101

Please define "import".

From what format, to which format?

If the data shouldn't "touch" SAS during import, you could use EXECUTE blocks in PROC SQL to Hive, or PROC HADOOP for operations outside Hive.

Data never sleeps
Occasional Contributor
Posts: 6

Re: SAS Pass through connection to hadoop

I used the following code to access data from hadoop. It took me 6 hrs for get 100000 records and 8K columns which seems very slow. Without options, it took 8 hrs. Can you please check and give suggestions?

 

 

options SGIO=yes;
options bufno=2000 bufsize=48K;
Libname sastest 'E:\SASMA\SASUserData\User\krishnaramasamy\Hadoop data';

proc sql;
connect to hadoop (user=%LOWCASE(&SYSUSERID.) password="XXXXX"
server='YYYYYY' uri='jdbc:hive2://YYYYYYY.com:8443/default?hive.server2.transport.mode=http;hive.execution.engine=tez;hive.server2.thrift.http.path=gateway/hdpprod/hive;hive.execution.engine=tez' schema=ZZZZZ);
create table sastest.test as select * from connection to hadoop
(
select * from test
limit 100000
);
disconnect from hadoop ;
quit;

Super User
Posts: 5,426

Re: SAS Pass through connection to hadoop

Posted in reply to krishnaram101

This seems like a Hadoop/Hive admin issue, not SAS (since it's the query inside Hive that takes time - unless you have extremely smll bandwidth to the Hadoop cluster).

Data never sleeps
Ask a Question
Discussion stats
  • 3 replies
  • 607 views
  • 0 likes
  • 2 in conversation