Hello,
I have a problem in work SAS HPDM. I running SAS code and see a warning message:
WARNING: The datanode running on cloudera-node2.example.loc did not respond. No file blocks are written on that host. ?
How to debug connection to the datanode cloudera-node2.example.loc ?
I use CLOUDERA HADOOP 5.11, SAS 9.4, TKGrid, SAS Plug-ins for Hadoop package (SASHDAT), SAS Embedded Process for Hadoop, RHEL 6.8
Hadoop cluster has 4 nodes.
7 proc hpatest;
8 performance nodes=all details;
9 run;
NOTE: The HPATEST procedure is executing in the distributed computing environment with 2 worker nodes. The SAS System 18:31 Tuesday, October 3, 2017 1
Performance Information
Host Node cloudera-node1
Execution Mode Distributed
Number of Compute Nodes 2
NOTE: The PROCEDURE HPATEST printed page 1.
NOTE: PROCEDURE HPATEST used (Total process time):
real time 2.01 seconds
cpu time 0.06 seconds
10
11 libname hdatLib sashdat path="/data/hps" verbose=yes;
NOTE: Libref HDATLIB was successfully assigned as follows:
Engine: SASHDAT
Physical Name: Directory '/data/hps' of HDFS cluster on host 'cloudera-node1'
12
13 data simData;
14 array _a{8} _temporary_ (0,0,0,1,0,1,1,1);
15 array _b{8} _temporary_ (0,0,1,0,1,0,1,1);
16 array _c{8} _temporary_ (0,1,0,0,1,1,0,1);
17 do obsno=1 to 10000000;
18 x = rantbl(1,0.28,0.18,0.14,0.14,0.03,0.09,0.08,0.06);
19 a1 = _a{x};
20 b1 = _b{x};
21 c1 = _c{x};
22 x1 = int(ranuni(1)*400);
23 x2 = 52 + ranuni(1)*38;
2 The SAS System 18:31 Tuesday, October 3, 2017
24 x3 = ranuni(1)*12;
25 lp = 6. -0.015*(1-a1) + 0.7*(1-b1) + 0.6*(1-c1) + 0.02*x1 -0.05*x2 - 0.1*x3;
26 y1 = ranbin(1,1,(1/(1+exp(lp))));
27 output;
28 end;
29 drop x lp;
30 run;
NOTE: The data set WORK.SIMDATA has 10000000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 3.36 seconds
cpu time 3.36 seconds
31
32 data hdatLib.simData;
33 set simData;
34 run;
WARNING: The datanode running on cloudera-node2.example.loc did not respond. No file blocks are written on that host.
NOTE: There were 10000000 observations read from the data set WORK.SIMDATA.
NOTE: The data set /data/hps/simdata has 10000000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 15.49 seconds
cpu time 1.79 seconds
First of all, check HDFS cluster status:
hdfs dfsadmin -report
Something is wrong with the datanode running on cloudera-node2.example.loc.
You can use the following steps to enable advanced debugging for loading to HDFS from SAS:
1. Create a .tkmpi.personal file in the /home/ directory of the user invoking SAS or LASR:
/home/userid/.tkmpi.personal
2. Insert the following line in the file:
export HADOOP_LASR_STDERR_LOG=/tmp/hdfs_fail.txt
3. Copy the .tkmpi.personal file across hosts of the grid.
/<PATH_TO_TKGRID>/bin/simcp /home/userid/.tkmpi.personal /home/userid
4. Perform the task or submit the code that generated the problem. From the example below:
libname hdfs SASHDAT SERVER="headnode.unx.lax.com"
INSTALL="/local/install/TKGrid" PATH="/hps/user";
data hdfs.mydata(replace=yes);
set sashelp.class;
run;
5. Inspect the /tmp/hdfs_fail.txt log. A clean log looks like this:
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Starting embedded NameNodeService
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Starting NameNodeService
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Data dirs: [/local/install/hadoop/hadoop-data]
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Ready to accept commands
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Processing command 9
/mm/dd hh:mm:ss INFO hadoop.BaseService: ConcatRequest [fileName=/hps/user/mydata.sashdat, flags=1, permissionsMask=774,
fileParts=[FilePartInfo [fullPath=/hps/user/mydata.sashdat-12.34.567.89.sashdat, numBlocks=1], ...
The fileparts= details are repeated for each grid host.
yy/mm/dd hh:mm:35 INFO hadoop.BaseService: Action 9 complete
6. Check the head node first. If that log is clean, you should check the debug log files on other nodes.
7. When debugging is complete, remove the .tkmpi.personal file across the grid.
/<PATH_TO_TKGRID>/bin/simcp /home/userid/.tkmpi.personal
Otherwise the debugging option may cause performance issues.
Thank You
# hdfs dfsadmin -report Configured Capacity: 590688002048 (550.12 GB) Present Capacity: 471623994732 (439.23 GB) DFS Remaining: 467021839791 (434.95 GB) DFS Used: 4602154941 (4.29 GB) DFS Used%: 0.98% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (3): Name: (cloudera-node1.example.loc) Hostname: cloudera-node1.example.loc Rack: /default Decommission Status : Normal Configured Capacity: 197020434432 (183.49 GB) DFS Used: 1533861821 (1.43 GB) Non DFS Used: 77469122627 (72.15 GB) DFS Remaining: 107416443365 (100.04 GB) DFS Used%: 0.78% DFS Remaining%: 54.52% Configured Cache Capacity: 1387266048 (1.29 GB) Cache Used: 0 (0 B) Cache Remaining: 1387266048 (1.29 GB) Cache Used%: 0.00% Cache Remaining%: 100.00% Xceivers: 4 Last contact: Wed Oct 04 11:09:44 MSK 2017 Name: (cloudera-node2.example.loc) Hostname: cloudera-node2.example.loc Rack: /default Decommission Status : Normal Configured Capacity: 196833783808 (183.32 GB) DFS Used: 1533829120 (1.43 GB) Non DFS Used: 6064525312 (5.65 GB) DFS Remaining: 178575792613 (166.31 GB) DFS Used%: 0.78% DFS Remaining%: 90.72% Configured Cache Capacity: 2599419904 (2.42 GB) Cache Used: 0 (0 B) Cache Remaining: 2599419904 (2.42 GB) Cache Used%: 0.00% Cache Remaining%: 100.00% Xceivers: 4 Last contact: Wed Oct 04 11:09:43 MSK 2017 Name: (cloudera-node3.example.loc) Hostname: cloudera-node3.example.loc Rack: /default Decommission Status : Normal Configured Capacity: 196833783808 (183.32 GB) DFS Used: 1534464000 (1.43 GB) Non DFS Used: 3610079232 (3.36 GB) DFS Remaining: 181029603813 (168.60 GB) DFS Used%: 0.78% DFS Remaining%: 91.97% Configured Cache Capacity: 2885681152 (2.69 GB) Cache Used: 0 (0 B) Cache Remaining: 2885681152 (2.69 GB) Cache Used%: 0.00% Cache Remaining%: 100.00% Xceivers: 4 Last contact: Wed Oct 04 11:09:43 MSK 2017
[root@cloudera-node2 ~]# cat /tmp/hdfs_fail.txt 17/10/04 18:55:52 INFO hadoop.BaseService: Starting embedded DataNodeService 20141117a 17/10/04 18:55:54 INFO hadoop.BaseService: Starting DataNodeService 20141117a 17/10/04 18:55:54 INFO hadoop.BaseService: Creating configuration 17/10/04 18:55:54 INFO hadoop.BaseService: sudoCommand=sudo 17/10/04 18:55:54 INFO hadoop.BaseService: shortCircuitCommand=/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/hadoop/bin/saslasrfd 17/10/04 18:55:54 INFO hadoop.BaseService: NameNode port=15452, DataNode port=15453 17/10/04 18:55:54 INFO hadoop.BaseService: Service version 6.0 17/10/04 18:55:54 INFO hadoop.BaseService: Data dirs: [/tmp/hadoop-sassrv/dfs/data] 17/10/04 18:55:54 ERROR hadoop.BaseService: Invalid rc (1) from process on host cloudera-node2.example.loc [df, /tmp/hadoop-sassrv/dfs/data]: df: `/tmp/hadoop-sassrv/dfs/data': No such file or directory df: no file systems processed java.io.IOException: Invalid rc (1) from process on host cloudera-node2.example.loc [df, /tmp/hadoop-sassrv/dfs/data]: df: `/tmp/hadoop-sassrv/dfs/data': No such file or directory df: no file systems processed at com.sas.lasr.hadoop.BaseService.invokeCommand(BaseService.java:1312) at com.sas.lasr.hadoop.BaseService.getMountpoint(BaseService.java:1269) at com.sas.lasr.hadoop.DataNodeService.createTempDirs(DataNodeService.java:215) at com.sas.lasr.hadoop.DataNodeService.start(DataNodeService.java:162) at com.sas.lasr.hadoop.DataNodeService.main(DataNodeService.java:98) 17/10/04 18:55:55 INFO hadoop.BaseService: Ready to accept commands 17/10/04 18:55:55 INFO hadoop.BaseService: Processing command 1 17/10/04 18:55:55 ERROR hadoop.BaseService: Connection refused (Connection refused) java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at java.net.Socket.<init>(Socket.java:434) at java.net.Socket.<init>(Socket.java:211) at com.sas.lasr.hadoop.BaseService.sendPing(BaseService.java:642) at com.sas.lasr.hadoop.DataNodeService.handleCommand(DataNodeService.java:286) at com.sas.lasr.hadoop.DataNodeService.main(DataNodeService.java:107) 17/10/04 17:48:17 ERROR hadoop.BaseService: Connection refused (Connection refused) java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at java.net.Socket.<init>(Socket.java:434) at java.net.Socket.<init>(Socket.java:211) at com.sas.lasr.hadoop.BaseService.sendPing(BaseService.java:642) at com.sas.lasr.hadoop.DataNodeService.handleCommand(DataNodeService.java:286) at com.sas.lasr.hadoop.DataNodeService.main(DataNodeService.java:107)
on data node (cloudera-node2) tcp port 15453 is not listen
How to debug LASR ?
There are no problems with LASR, this is the problem:
17/10/04 18:55:54 INFO hadoop.BaseService: Data dirs: [/tmp/hadoop-sassrv/dfs/data]
17/10/04 18:55:54 ERROR hadoop.BaseService: Invalid rc (1) from process on host cloudera-node2.example.loc [df, /tmp/hadoop-sassrv/dfs/data]: df: `/tmp/hadoop-sassrv/dfs/data': No such file or directory df: no file systems processed
java.io.IOException: Invalid rc (1) from process on host cloudera-node2.example.loc [df, /tmp/hadoop-sassrv/dfs/data]: df: `/tmp/hadoop-sassrv/dfs/data': No such file or directory df: no file systems processed
at com.sas.lasr.hadoop.BaseService.invokeCommand(BaseService.java:1312)
at com.sas.lasr.hadoop.BaseService.getMountpoint(BaseService.java:1269)
at com.sas.lasr.hadoop.DataNodeService.createTempDirs(DataNodeService.java:215)
at com.sas.lasr.hadoop.DataNodeService.start(DataNodeService.java:162)
at com.sas.lasr.hadoop.DataNodeService.main(DataNodeService.java:98)
OK, but how to resolve it? Where pay attention?
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.