BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JBailey
Barite | Level 11

Hi Daniele,

I have reproduced this.

When I have the SAS_HADOOP_CONFIG_PATH= environment set properly, it works. If this isn't set properly, it creates the table but doesn't put data into it.

I have the following files in there (I got them from the CDH quickstart):

core-site.xml

hdfs-site.xml

hive-site.xml

mapred-site.xml

yarn-site.xml

Cloudera Manager will download these for you. Connect a local browser to Cloudera Manager on the VM. This will allow you to easily get them to your PC.

From the home screen select Hive.

There is an Actions button in the upper right hand corner. Select Download Client Configuration from the pull-down.

It will download a zip to your machine.

Take the XML files and place them into the directory pointed to by your SAS_HADOOP_CONFIG_PATH= environment variable. Restart SAS so it picks up the new environment. Then run this code to see if it works.

libname mycdh hadoop server=quickstart user=cloudera;

data mycdh.cars;

  set sashelp.cars;

run;

/* this is a streaming HDFS read */

proc sql;

  select * from mycdh.cars;

quit;

/* this invokes mapreduce */

proc sql;

  select count(*) from mycdh.cars;

quit;

daniele_laguardia
SAS Employee

Hi Jeff,

Thank you very much for suggestions. My focus is now to solve the problem on the physical hadoop cluster.

Going into a deep analysis of the problem we found a possible issue in the network communication. In particular, seems to to be necessary a complete visibility between the SAS Client and all HDFS datanode . Could you confirm this ? The SAS client needs to reach all datanode to the HDFS port (50010 / 50020 / 50075) ? There are other service / port on data nodes that need to be reached ?

Now we can reach only the name node at port 8020 and hive server at  port 10000 but there is firewall rules between our client and the data nodes of the cluster.

Thank you

D.

FriedEgg
SAS Employee

,

As I stated initially, you need to make sure the data nodes are reachable from the SAS client.  This is one of the few reasons you would see this type of issue.  I can think of three reasons this type of issue can occur:

1. Sticky bit on /tmp, or HDFS_TEMP location is not writable for any other reason.

2. The data nodes are not reachable from the SAS client, by IP address and port, or by hostname if dfs.datanode.use.datanode.hostname is set in hdfs config

3. The namenode needs to be re-formatted as the configuration of the datanodes has changed or has otherwise been corrupted in some manner

daniele_laguardia
SAS Employee

Hi,

thanks for the quick response Smiley Happy

This is the situation:

[tucloud@usqrtpl5394 ~]$ hdfs dfs -ls /

Found 5 items

drwxrwxrwx   - hdfs  supergroup          0 2015-03-18 18:50 /crmon

drwxrwxrwx   - hdfs  supergroup          0 2015-02-16 10:51 /h2oTEMP

drwxr-xr-x   - sasep sasep               0 2015-02-27 15:21 /sas

drwxrwxrwt   - hdfs  supergroup          0 2015-03-25 15:33 /tmp

drwxr-xr-x   - hdfs  supergroup          0 2015-02-10 13:21 /user

sticky bit it's on but i have the same problem if i define HDFS_TEMPDIR parameter in the libname statment:

libname hive hadoop

server='usqrtpl5396.internal.unicreditgroup.eu'

user=tucloud

HDFS_TEMPDIR="/user/tucloud"

database=crmon

subprotocol=hive2;

I can reach any node with ping cmd from my sas server.

Thanks

D.

FriedEgg
SAS Employee

Do you have access to the name node and data node logs?

daniele_laguardia
SAS Employee

I'm able to connect to the name node and open this logs:

[tucloud@usqrtpl5394 hadoop-hdfs]$ pwd

/var/log/hadoop-hdfs

[tucloud@usqrtpl5394 hadoop-hdfs]$ ls

hadoop-cmf-hdfs-DATANODE-usqrtpl5394.internal.unicreditgroup.eu.log.out            hdfs-audit.log

hadoop-cmf-hdfs-FAILOVERCONTROLLER-usqrtpl5394.internal.unicreditgroup.eu.log.out  hdfs-audit.log.1

hadoop-cmf-hdfs-JOURNALNODE-usqrtpl5394.internal.unicreditgroup.eu.log.out         hdfs-audit.log.2

hadoop-cmf-hdfs-NAMENODE-usqrtpl5394.internal.unicreditgroup.eu.log.out            SecurityAuth-hdfs.audit

hadoop-cmf-hdfs-NAMENODE-usqrtpl5394.internal.unicreditgroup.eu.log.out.1          stacks

hadoop-cmf-hdfs-NAMENODE-usqrtpl5394.internal.unicreditgroup.eu.log.out.2

[tucloud@usqrtpl5394 hadoop-hdfs]$

but no error found (tail on log file when the sas script is running)

Is /var/log/hadoop-hdfs the logs directory ?

JBailey
Barite | Level 11

Hi Daniele,

Which version of SAS are you using? SAS 9.4M2 will tell you that there is a problem with the sticky bit. I am using the CDH 5.3 quickstart and SAS 9.4M2.

The CDH Quickstart works perfectly, out of the box. I had to set the sticky bit on /tmp in order to generate the following error.

/*set the sticky bit from the quickstart terminal */

sudo -u hdfs hadoop fs -chmod 1777 /tmp

/* remove the sticky bit */

sudo -u hdfs hadoop fs -chmod -t /tmp

Output from the SAS log.

6    libname mycdh hadoop server=quickstart port=10000;

NOTE: Libref MYCDH was successfully assigned as follows:

      Engine:        HADOOP

      Physical Name: jdbc:hive2://quickstart:10000/default

7    data mycdh.cars2;

8       set sashelp.cars;

9    run;

NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.

NOTE: There were 428 observations read from the data set SASHELP.CARS.

NOTE: The data set MYCDH.CARS2 has 428 observations and 15 variables.

ERROR: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return

       code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

ERROR: Unable to execute Hadoop query.

ERROR: Execute error on statement: LOAD DATA INPATH

       '/tmp/sasdata-2015-03-25-14-49-40-774-e-00001.dlv' OVERWRITE INTO TABLE `CARS2`. Could not

       load /tmp/sasdata-2015-03-25-14-49-40-774-e-00001.dlv into table CARS2 in schema DEFAULT.

       A common cause of this issue is conflicting HDFS permissions between the data file and the

       Hive warehouse directory for the table.  Another possible cause is the "sticky" bit set on

       HDFS directory /tmp.

NOTE: DATA statement used (Total process time):

      real time           0.55 seconds

      cpu time            0.06 seconds

If you are using an older version of SAS, say 9.4M1, you will not see this error. It will look like it works but doesn't. Previous releases of CDH quickstart had the sticky bit set.

daniele_laguardia
SAS Employee

Hi Jeff,

i'm using  9.4 M2

I have the same problem even if the sticky bit is set to off.( or by changing the HDFS_TEMP path).

Opening the log file of the VM i found this message:

2015-03-26 02:21:50,793 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: there are no corrupt file blocks.

2015-03-26 02:21:59,029 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds

2015-03-26 02:21:59,029 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).

2015-03-26 02:22:29,029 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds

2015-03-26 02:22:29,029 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).

2015-03-26 02:22:50,818 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: there are no corrupt file blocks.

2015-03-26 02:22:51,058 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 97 Total time for transactions(ms): 61 Number of transactions batched in Syncs: 7 Number of syncs: 84 SyncTimes(ms): 289

2015-03-26 02:22:51,472 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/sasdata-2015-03-26-07-20-34-420-e-00003.dlv. BP-150411824-127.0.0.1-1418915217884 blk_1073742312_1498{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-58c87cab-9390-42e3-ba20-58d96ba4696e:NORMAL|RBW]]}

2015-03-26 02:22:52,376 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1. For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy

2015-03-26 02:22:52,376 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.IOException: File /tmp/sasdata-2015-03-26-07-20-34-420-e-00003.dlv could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

2015-03-26 02:22:52,376 INFO org.apache.hadoop.ipc.Server: IPC Server handler 16 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.100.1:49699 Call#30 Retry#0

java.io.IOException: File /tmp/sasdata-2015-03-26-07-20-34-420-e-00003.dlv could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1504)

  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3065)

  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615)

  at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188)

  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476)

  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)

  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)

  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)

  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)

  at java.security.AccessController.doPrivileged(Native Method)

  at javax.security.auth.Subject.doAs(Subject.java:415)

  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)

  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

2015-03-26 02:22:52,629 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: User does not belong to hive

2015-03-26 02:22:52,630 INFO org.apache.hadoop.ipc.Server: IPC Server handler 16 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setOwner from 127.0.0.1:41381 Call#191 Retry#0: org.apache.hadoop.security.AccessControlException: User does not belong to hive

2015-03-26 02:22:59,035 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30006 milliseconds

2015-03-26 02:22:59,035 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).

2015-03-26 02:23:29,039 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30004 milliseconds

2015-03-26 02:23:29,039 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).

2015-03-26 02:23:50,792 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: there are no corrupt file blocks.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 22 replies
  • 7347 views
  • 5 likes
  • 4 in conversation