BookmarkSubscribeRSS Feed
VladimirLaskov
Calcite | Level 5

Hello,
I have a problem in work SAS HPDM. I running SAS code and see a warning message:

WARNING: The datanode running on cloudera-node2.example.loc did not respond. No file blocks are written on that host. ?

How to debug connection to the datanode cloudera-node2.example.loc ?

I use CLOUDERA HADOOP 5.11, SAS 9.4, TKGrid, SAS Plug-ins for Hadoop package (SASHDAT), SAS Embedded Process for Hadoop, RHEL 6.8
Hadoop cluster has 4 nodes.



7 proc hpatest;
8 performance nodes=all details;
9 run;
NOTE: The HPATEST procedure is executing in the distributed computing environment with 2 worker nodes. The SAS System 18:31 Tuesday, October 3, 2017 1

Performance Information

Host Node cloudera-node1
Execution Mode Distributed
Number of Compute Nodes 2
NOTE: The PROCEDURE HPATEST printed page 1.
NOTE: PROCEDURE HPATEST used (Total process time):
real time 2.01 seconds
cpu time 0.06 seconds

10
11 libname hdatLib sashdat path="/data/hps" verbose=yes;
NOTE: Libref HDATLIB was successfully assigned as follows:
Engine: SASHDAT
Physical Name: Directory '/data/hps' of HDFS cluster on host 'cloudera-node1'
12
13 data simData;
14 array _a{8} _temporary_ (0,0,0,1,0,1,1,1);
15 array _b{8} _temporary_ (0,0,1,0,1,0,1,1);
16 array _c{8} _temporary_ (0,1,0,0,1,1,0,1);
17 do obsno=1 to 10000000;
18 x = rantbl(1,0.28,0.18,0.14,0.14,0.03,0.09,0.08,0.06);
19 a1 = _a{x};
20 b1 = _b{x};
21 c1 = _c{x};
22 x1 = int(ranuni(1)*400);
23 x2 = 52 + ranuni(1)*38;

2 The SAS System 18:31 Tuesday, October 3, 2017

24 x3 = ranuni(1)*12;
25 lp = 6. -0.015*(1-a1) + 0.7*(1-b1) + 0.6*(1-c1) + 0.02*x1 -0.05*x2 - 0.1*x3;
26 y1 = ranbin(1,1,(1/(1+exp(lp))));
27 output;
28 end;
29 drop x lp;
30 run;
NOTE: The data set WORK.SIMDATA has 10000000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 3.36 seconds
cpu time 3.36 seconds

31
32 data hdatLib.simData;
33 set simData;
34 run;
WARNING: The datanode running on cloudera-node2.example.loc did not respond. No file blocks are written on that host.
NOTE: There were 10000000 observations read from the data set WORK.SIMDATA.
NOTE: The data set /data/hps/simdata has 10000000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 15.49 seconds
cpu time 1.79 seconds





5 REPLIES 5
alexal
SAS Employee

@VladimirLaskov,

 

First of all, check HDFS cluster status:

 

hdfs dfsadmin -report

 

Something is wrong with the datanode running on cloudera-node2.example.loc.

 

You can use the following steps to enable advanced debugging for loading to HDFS from SAS:

1. Create a .tkmpi.personal file in the /home/ directory of the user invoking SAS or LASR:
/home/userid/.tkmpi.personal

2. Insert the following line in the file:
export HADOOP_LASR_STDERR_LOG=/tmp/hdfs_fail.txt

3. Copy the .tkmpi.personal file across hosts of the grid.
/<PATH_TO_TKGRID>/bin/simcp /home/userid/.tkmpi.personal /home/userid

4. Perform the task or submit the code that generated the problem. From the example below:
libname hdfs SASHDAT SERVER="headnode.unx.lax.com"
INSTALL="/local/install/TKGrid" PATH="/hps/user";
data hdfs.mydata(replace=yes);
set sashelp.class;
run;

5. Inspect the /tmp/hdfs_fail.txt log. A clean log looks like this:
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Starting embedded NameNodeService
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Starting NameNodeService
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Data dirs: [/local/install/hadoop/hadoop-data]
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Ready to accept commands
yy/mm/dd hh:mm:ss INFO hadoop.BaseService: Processing command 9
/mm/dd hh:mm:ss INFO hadoop.BaseService: ConcatRequest [fileName=/hps/user/mydata.sashdat, flags=1, permissionsMask=774,
fileParts=[FilePartInfo [fullPath=/hps/user/mydata.sashdat-12.34.567.89.sashdat, numBlocks=1], ...
The fileparts= details are repeated for each grid host.
yy/mm/dd hh:mm:35 INFO hadoop.BaseService: Action 9 complete

6. Check the head node first. If that log is clean, you should check the debug log files on other nodes.

7. When debugging is complete, remove the .tkmpi.personal file across the grid.
/<PATH_TO_TKGRID>/bin/simcp /home/userid/.tkmpi.personal
Otherwise the debugging option may cause performance issues.

VladimirLaskov
Calcite | Level 5

 


Thank You
 

# hdfs dfsadmin -report
Configured Capacity: 590688002048 (550.12 GB)
Present Capacity: 471623994732 (439.23 GB)
DFS Remaining: 467021839791 (434.95 GB)
DFS Used: 4602154941 (4.29 GB)
DFS Used%: 0.98%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (3):

Name: (cloudera-node1.example.loc)
Hostname: cloudera-node1.example.loc
Rack: /default
Decommission Status : Normal
Configured Capacity: 197020434432 (183.49 GB)
DFS Used: 1533861821 (1.43 GB)
Non DFS Used: 77469122627 (72.15 GB)
DFS Remaining: 107416443365 (100.04 GB)
DFS Used%: 0.78%
DFS Remaining%: 54.52%
Configured Cache Capacity: 1387266048 (1.29 GB)
Cache Used: 0 (0 B)
Cache Remaining: 1387266048 (1.29 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 4
Last contact: Wed Oct 04 11:09:44 MSK 2017


Name: (cloudera-node2.example.loc)
Hostname: cloudera-node2.example.loc
Rack: /default
Decommission Status : Normal
Configured Capacity: 196833783808 (183.32 GB)
DFS Used: 1533829120 (1.43 GB)
Non DFS Used: 6064525312 (5.65 GB)
DFS Remaining: 178575792613 (166.31 GB)
DFS Used%: 0.78%
DFS Remaining%: 90.72%
Configured Cache Capacity: 2599419904 (2.42 GB)
Cache Used: 0 (0 B)
Cache Remaining: 2599419904 (2.42 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 4
Last contact: Wed Oct 04 11:09:43 MSK 2017


Name: (cloudera-node3.example.loc)
Hostname: cloudera-node3.example.loc
Rack: /default
Decommission Status : Normal
Configured Capacity: 196833783808 (183.32 GB)
DFS Used: 1534464000 (1.43 GB)
Non DFS Used: 3610079232 (3.36 GB)
DFS Remaining: 181029603813 (168.60 GB)
DFS Used%: 0.78%
DFS Remaining%: 91.97%
Configured Cache Capacity: 2885681152 (2.69 GB)
Cache Used: 0 (0 B)
Cache Remaining: 2885681152 (2.69 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 4
Last contact: Wed Oct 04 11:09:43 MSK 2017



[root@cloudera-node2 ~]# cat /tmp/hdfs_fail.txt
17/10/04 18:55:52 INFO hadoop.BaseService: Starting embedded DataNodeService 20141117a
17/10/04 18:55:54 INFO hadoop.BaseService: Starting DataNodeService 20141117a
17/10/04 18:55:54 INFO hadoop.BaseService: Creating configuration
17/10/04 18:55:54 INFO hadoop.BaseService: sudoCommand=sudo
17/10/04 18:55:54 INFO hadoop.BaseService: shortCircuitCommand=/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/hadoop/bin/saslasrfd
17/10/04 18:55:54 INFO hadoop.BaseService: NameNode port=15452, DataNode port=15453
17/10/04 18:55:54 INFO hadoop.BaseService: Service version 6.0
17/10/04 18:55:54 INFO hadoop.BaseService: Data dirs: [/tmp/hadoop-sassrv/dfs/data]
17/10/04 18:55:54 ERROR hadoop.BaseService: Invalid rc (1) from process on host cloudera-node2.example.loc [df, /tmp/hadoop-sassrv/dfs/data]: df: `/tmp/hadoop-sassrv/dfs/data': No such file or directory df: no file systems processed
java.io.IOException: Invalid rc (1) from process on host cloudera-node2.example.loc [df, /tmp/hadoop-sassrv/dfs/data]: df: `/tmp/hadoop-sassrv/dfs/data': No such file or directory df: no file systems processed
	at com.sas.lasr.hadoop.BaseService.invokeCommand(BaseService.java:1312)
	at com.sas.lasr.hadoop.BaseService.getMountpoint(BaseService.java:1269)
	at com.sas.lasr.hadoop.DataNodeService.createTempDirs(DataNodeService.java:215)
	at com.sas.lasr.hadoop.DataNodeService.start(DataNodeService.java:162)
	at com.sas.lasr.hadoop.DataNodeService.main(DataNodeService.java:98)
17/10/04 18:55:55 INFO hadoop.BaseService: Ready to accept commands
17/10/04 18:55:55 INFO hadoop.BaseService: Processing command 1
17/10/04 18:55:55 ERROR hadoop.BaseService: Connection refused (Connection refused)
java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at java.net.Socket.connect(Socket.java:538)
	at java.net.Socket.<init>(Socket.java:434)
	at java.net.Socket.<init>(Socket.java:211)
	at com.sas.lasr.hadoop.BaseService.sendPing(BaseService.java:642)
	at com.sas.lasr.hadoop.DataNodeService.handleCommand(DataNodeService.java:286)
	at com.sas.lasr.hadoop.DataNodeService.main(DataNodeService.java:107)
17/10/04 17:48:17 ERROR hadoop.BaseService: Connection refused (Connection refused)
java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at java.net.Socket.connect(Socket.java:538)
	at java.net.Socket.<init>(Socket.java:434)
	at java.net.Socket.<init>(Socket.java:211)
	at com.sas.lasr.hadoop.BaseService.sendPing(BaseService.java:642)
	at com.sas.lasr.hadoop.DataNodeService.handleCommand(DataNodeService.java:286)
	at com.sas.lasr.hadoop.DataNodeService.main(DataNodeService.java:107)


on data node (cloudera-node2) tcp port 15453 is not listen

How to debug LASR ?

alexal
SAS Employee

@VladimirLaskov,

 

There are no problems with LASR, this is the problem:

 

17/10/04 18:55:54 INFO hadoop.BaseService: Data dirs: [/tmp/hadoop-sassrv/dfs/data]
17/10/04 18:55:54 ERROR hadoop.BaseService: Invalid rc (1) from process on host cloudera-node2.example.loc [df, /tmp/hadoop-sassrv/dfs/data]: df: `/tmp/hadoop-sassrv/dfs/data': No such file or directory df: no file systems processed
java.io.IOException: Invalid rc (1) from process on host cloudera-node2.example.loc [df, /tmp/hadoop-sassrv/dfs/data]: df: `/tmp/hadoop-sassrv/dfs/data': No such file or directory df: no file systems processed
at com.sas.lasr.hadoop.BaseService.invokeCommand(BaseService.java:1312)
at com.sas.lasr.hadoop.BaseService.getMountpoint(BaseService.java:1269)
at com.sas.lasr.hadoop.DataNodeService.createTempDirs(DataNodeService.java:215)
at com.sas.lasr.hadoop.DataNodeService.start(DataNodeService.java:162)
at com.sas.lasr.hadoop.DataNodeService.main(DataNodeService.java:98)
VladimirLaskov
Calcite | Level 5

OK, but how to resolve it? Where pay attention?

alexal
SAS Employee

@VladimirLaskov,

 

Who is managing your Hadoop environment? Your Hadoop administrator can help you.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1749 views
  • 0 likes
  • 2 in conversation