Don't just think like a data scientist. Be one.

Replication Factor for SASHDAT and HDFS

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

Replication Factor for SASHDAT and HDFS

When we use SASHDAT libname engine the files are place on HDFS using the path= <HDFS path> and the

copies= for number of replication. The replication factor for SASHDAT tables is 2 by default.

Whereas on HDFS the replication factor is 3 by default. Now if a sas table is loaded to HDFS,

though SASHDAT it will have 2 copies or where as in HDFS it will have 3 Copies. How is that possible.

I'm bit confused. Can anyone explain me the above please?


Accepted Solutions
Solution
‎12-15-2017 05:14 AM
SAS Employee
Posts: 33

Re: Replication Factor for SASHDAT and HDFS

Posted in reply to akpattnaik

I tested as LinusH suggested. He is correct. HDFS has a default replication factor and the SASHDAT engine overrides that when it creates files in HDFS. The LIBNAME engine for SASHDAT has a default value for copies= even if you don't specify it on the LIBNAME statement. This is what I found in the doc:

COPIES=n

specifies the number of replications to make for the data set (beyond the original blocks). The default value is 2 when the INNAMEONLY option is specified and otherwise is 1. Replicated blocks are used to provide fault tolerance. If a machine in the cluster becomes unavailable, then the blocks needed for the SASHDAT file can be retrieved from replications on other machines. If you specify COPIES=0, then the original blocks are distributed, but no replications are made and there is no fault tolerance for the data.

 

Here is the link to that part of the documentation: http://support.sas.com/documentation/cdl/en/inmsref/70021/HTML/default/viewer.htm#p0kn1b8a7yt44fn1qw...

 

also, here is how I discovered with HDFS commands how to determine the replication factor for HDFS files:

https://www.systutorials.com/qa/1297/how-to-check-the-replication-factor-of-a-file-in-hdfs

 

 

 

 

 

View solution in original post


All Replies
Super User
Posts: 5,881

Re: Replication Factor for SASHDAT and HDFS

Posted in reply to akpattnaik

Not doing this a lot, so from perspective I would guess that the SAS default is overriding the hdfs default. So no, I don't think that there will be three. Have you checked in the file system?

Data never sleeps
Solution
‎12-15-2017 05:14 AM
SAS Employee
Posts: 33

Re: Replication Factor for SASHDAT and HDFS

Posted in reply to akpattnaik

I tested as LinusH suggested. He is correct. HDFS has a default replication factor and the SASHDAT engine overrides that when it creates files in HDFS. The LIBNAME engine for SASHDAT has a default value for copies= even if you don't specify it on the LIBNAME statement. This is what I found in the doc:

COPIES=n

specifies the number of replications to make for the data set (beyond the original blocks). The default value is 2 when the INNAMEONLY option is specified and otherwise is 1. Replicated blocks are used to provide fault tolerance. If a machine in the cluster becomes unavailable, then the blocks needed for the SASHDAT file can be retrieved from replications on other machines. If you specify COPIES=0, then the original blocks are distributed, but no replications are made and there is no fault tolerance for the data.

 

Here is the link to that part of the documentation: http://support.sas.com/documentation/cdl/en/inmsref/70021/HTML/default/viewer.htm#p0kn1b8a7yt44fn1qw...

 

also, here is how I discovered with HDFS commands how to determine the replication factor for HDFS files:

https://www.systutorials.com/qa/1297/how-to-check-the-replication-factor-of-a-file-in-hdfs

 

 

 

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 281 views
  • 0 likes
  • 3 in conversation