Hi Daniele,
I have reproduced this.
When I have the SAS_HADOOP_CONFIG_PATH= environment set properly, it works. If this isn't set properly, it creates the table but doesn't put data into it.
I have the following files in there (I got them from the CDH quickstart):
core-site.xml
hdfs-site.xml
hive-site.xml
mapred-site.xml
yarn-site.xml
Cloudera Manager will download these for you. Connect a local browser to Cloudera Manager on the VM. This will allow you to easily get them to your PC.
From the home screen select Hive.
There is an Actions button in the upper right hand corner. Select Download Client Configuration from the pull-down.
It will download a zip to your machine.
Take the XML files and place them into the directory pointed to by your SAS_HADOOP_CONFIG_PATH= environment variable. Restart SAS so it picks up the new environment. Then run this code to see if it works.
libname mycdh hadoop server=quickstart user=cloudera;
data mycdh.cars;
set sashelp.cars;
run;
/* this is a streaming HDFS read */
proc sql;
select * from mycdh.cars;
quit;
/* this invokes mapreduce */
proc sql;
select count(*) from mycdh.cars;
quit;
Hi Jeff,
Thank you very much for suggestions. My focus is now to solve the problem on the physical hadoop cluster.
Going into a deep analysis of the problem we found a possible issue in the network communication. In particular, seems to to be necessary a complete visibility between the SAS Client and all HDFS datanode . Could you confirm this ? The SAS client needs to reach all datanode to the HDFS port (50010 / 50020 / 50075) ? There are other service / port on data nodes that need to be reached ?
Now we can reach only the name node at port 8020 and hive server at port 10000 but there is firewall rules between our client and the data nodes of the cluster.
Thank you
D.
As I stated initially, you need to make sure the data nodes are reachable from the SAS client. This is one of the few reasons you would see this type of issue. I can think of three reasons this type of issue can occur:
1. Sticky bit on /tmp, or HDFS_TEMP location is not writable for any other reason.
2. The data nodes are not reachable from the SAS client, by IP address and port, or by hostname if dfs.datanode.use.datanode.hostname is set in hdfs config
3. The namenode needs to be re-formatted as the configuration of the datanodes has changed or has otherwise been corrupted in some manner
Hi,
thanks for the quick response
This is the situation:
[tucloud@usqrtpl5394 ~]$ hdfs dfs -ls /
Found 5 items
drwxrwxrwx - hdfs supergroup 0 2015-03-18 18:50 /crmon
drwxrwxrwx - hdfs supergroup 0 2015-02-16 10:51 /h2oTEMP
drwxr-xr-x - sasep sasep 0 2015-02-27 15:21 /sas
drwxrwxrwt - hdfs supergroup 0 2015-03-25 15:33 /tmp
drwxr-xr-x - hdfs supergroup 0 2015-02-10 13:21 /user
sticky bit it's on but i have the same problem if i define HDFS_TEMPDIR parameter in the libname statment:
libname hive hadoop
server='usqrtpl5396.internal.unicreditgroup.eu'
user=tucloud
HDFS_TEMPDIR="/user/tucloud"
database=crmon
subprotocol=hive2;
I can reach any node with ping cmd from my sas server.
Thanks
D.
Do you have access to the name node and data node logs?
I'm able to connect to the name node and open this logs:
[tucloud@usqrtpl5394 hadoop-hdfs]$ pwd
/var/log/hadoop-hdfs
[tucloud@usqrtpl5394 hadoop-hdfs]$ ls
hadoop-cmf-hdfs-DATANODE-usqrtpl5394.internal.unicreditgroup.eu.log.out hdfs-audit.log
hadoop-cmf-hdfs-FAILOVERCONTROLLER-usqrtpl5394.internal.unicreditgroup.eu.log.out hdfs-audit.log.1
hadoop-cmf-hdfs-JOURNALNODE-usqrtpl5394.internal.unicreditgroup.eu.log.out hdfs-audit.log.2
hadoop-cmf-hdfs-NAMENODE-usqrtpl5394.internal.unicreditgroup.eu.log.out SecurityAuth-hdfs.audit
hadoop-cmf-hdfs-NAMENODE-usqrtpl5394.internal.unicreditgroup.eu.log.out.1 stacks
hadoop-cmf-hdfs-NAMENODE-usqrtpl5394.internal.unicreditgroup.eu.log.out.2
[tucloud@usqrtpl5394 hadoop-hdfs]$
but no error found (tail on log file when the sas script is running)
Is /var/log/hadoop-hdfs the logs directory ?
Hi Daniele,
Which version of SAS are you using? SAS 9.4M2 will tell you that there is a problem with the sticky bit. I am using the CDH 5.3 quickstart and SAS 9.4M2.
The CDH Quickstart works perfectly, out of the box. I had to set the sticky bit on /tmp in order to generate the following error.
/*set the sticky bit from the quickstart terminal */
sudo -u hdfs hadoop fs -chmod 1777 /tmp
/* remove the sticky bit */
sudo -u hdfs hadoop fs -chmod -t /tmp
Output from the SAS log.
6 libname mycdh hadoop server=quickstart port=10000;
NOTE: Libref MYCDH was successfully assigned as follows:
Engine: HADOOP
Physical Name: jdbc:hive2://quickstart:10000/default
7 data mycdh.cars2;
8 set sashelp.cars;
9 run;
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set MYCDH.CARS2 has 428 observations and 15 variables.
ERROR: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
ERROR: Unable to execute Hadoop query.
ERROR: Execute error on statement: LOAD DATA INPATH
'/tmp/sasdata-2015-03-25-14-49-40-774-e-00001.dlv' OVERWRITE INTO TABLE `CARS2`. Could not
load /tmp/sasdata-2015-03-25-14-49-40-774-e-00001.dlv into table CARS2 in schema DEFAULT.
A common cause of this issue is conflicting HDFS permissions between the data file and the
Hive warehouse directory for the table. Another possible cause is the "sticky" bit set on
HDFS directory /tmp.
NOTE: DATA statement used (Total process time):
real time 0.55 seconds
cpu time 0.06 seconds
If you are using an older version of SAS, say 9.4M1, you will not see this error. It will look like it works but doesn't. Previous releases of CDH quickstart had the sticky bit set.
Hi Jeff,
i'm using 9.4 M2
I have the same problem even if the sticky bit is set to off.( or by changing the HDFS_TEMP path).
Opening the log file of the VM i found this message:
2015-03-26 02:21:50,793 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: there are no corrupt file blocks.
2015-03-26 02:21:59,029 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2015-03-26 02:21:59,029 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2015-03-26 02:22:29,029 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2015-03-26 02:22:29,029 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2015-03-26 02:22:50,818 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: there are no corrupt file blocks.
2015-03-26 02:22:51,058 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 97 Total time for transactions(ms): 61 Number of transactions batched in Syncs: 7 Number of syncs: 84 SyncTimes(ms): 289
2015-03-26 02:22:51,472 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/sasdata-2015-03-26-07-20-34-420-e-00003.dlv. BP-150411824-127.0.0.1-1418915217884 blk_1073742312_1498{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-58c87cab-9390-42e3-ba20-58d96ba4696e:NORMAL|RBW]]}
2015-03-26 02:22:52,376 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1. For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2015-03-26 02:22:52,376 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.IOException: File /tmp/sasdata-2015-03-26-07-20-34-420-e-00003.dlv could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
2015-03-26 02:22:52,376 INFO org.apache.hadoop.ipc.Server: IPC Server handler 16 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.100.1:49699 Call#30 Retry#0
java.io.IOException: File /tmp/sasdata-2015-03-26-07-20-34-420-e-00003.dlv could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1504)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3065)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
2015-03-26 02:22:52,629 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: User does not belong to hive
2015-03-26 02:22:52,630 INFO org.apache.hadoop.ipc.Server: IPC Server handler 16 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setOwner from 127.0.0.1:41381 Call#191 Retry#0: org.apache.hadoop.security.AccessControlException: User does not belong to hive
2015-03-26 02:22:59,035 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30006 milliseconds
2015-03-26 02:22:59,035 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2015-03-26 02:23:29,039 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30004 milliseconds
2015-03-26 02:23:29,039 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2015-03-26 02:23:50,792 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: there are no corrupt file blocks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.