BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
akpattnaik
Obsidian | Level 7

When we use SASHDAT libname engine the files are place on HDFS using the path= <HDFS path> and the

copies= for number of replication. The replication factor for SASHDAT tables is 2 by default.

Whereas on HDFS the replication factor is 3 by default. Now if a sas table is loaded to HDFS,

though SASHDAT it will have 2 copies or where as in HDFS it will have 3 Copies. How is that possible.

I'm bit confused. Can anyone explain me the above please?

1 ACCEPTED SOLUTION

Accepted Solutions
DavidGhan
SAS Employee

I tested as LinusH suggested. He is correct. HDFS has a default replication factor and the SASHDAT engine overrides that when it creates files in HDFS. The LIBNAME engine for SASHDAT has a default value for copies= even if you don't specify it on the LIBNAME statement. This is what I found in the doc:

COPIES=n

specifies the number of replications to make for the data set (beyond the original blocks). The default value is 2 when the INNAMEONLY option is specified and otherwise is 1. Replicated blocks are used to provide fault tolerance. If a machine in the cluster becomes unavailable, then the blocks needed for the SASHDAT file can be retrieved from replications on other machines. If you specify COPIES=0, then the original blocks are distributed, but no replications are made and there is no fault tolerance for the data.

 

Here is the link to that part of the documentation: http://support.sas.com/documentation/cdl/en/inmsref/70021/HTML/default/viewer.htm#p0kn1b8a7yt44fn1qw...

 

also, here is how I discovered with HDFS commands how to determine the replication factor for HDFS files:

https://www.systutorials.com/qa/1297/how-to-check-the-replication-factor-of-a-file-in-hdfs

 

 

 

 

 

View solution in original post

2 REPLIES 2
LinusH
Tourmaline | Level 20

Not doing this a lot, so from perspective I would guess that the SAS default is overriding the hdfs default. So no, I don't think that there will be three. Have you checked in the file system?

Data never sleeps
DavidGhan
SAS Employee

I tested as LinusH suggested. He is correct. HDFS has a default replication factor and the SASHDAT engine overrides that when it creates files in HDFS. The LIBNAME engine for SASHDAT has a default value for copies= even if you don't specify it on the LIBNAME statement. This is what I found in the doc:

COPIES=n

specifies the number of replications to make for the data set (beyond the original blocks). The default value is 2 when the INNAMEONLY option is specified and otherwise is 1. Replicated blocks are used to provide fault tolerance. If a machine in the cluster becomes unavailable, then the blocks needed for the SASHDAT file can be retrieved from replications on other machines. If you specify COPIES=0, then the original blocks are distributed, but no replications are made and there is no fault tolerance for the data.

 

Here is the link to that part of the documentation: http://support.sas.com/documentation/cdl/en/inmsref/70021/HTML/default/viewer.htm#p0kn1b8a7yt44fn1qw...

 

also, here is how I discovered with HDFS commands how to determine the replication factor for HDFS files:

https://www.systutorials.com/qa/1297/how-to-check-the-replication-factor-of-a-file-in-hdfs

 

 

 

 

 

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1688 views
  • 0 likes
  • 3 in conversation