BookmarkSubscribeRSS Feed

Did you mean DataTransferMode? Or DataTransferMode? Or maybe ParallelMode?

Started ‎06-03-2019 by
Modified ‎06-03-2019 by
Views 1,967

SAS Viya and Cloud Analytic Services (CAS) are flexible to work with data from a variety of different source systems. In some cases, those source systems support the ability to move data into CAS using multiple, concurrent, parallel streams which effectively multiplies the rate at which data is loaded into CAS. From a SAS programming perspective, the ability to direct transfer in parallel depends on the nature of the environment, how the software is deployed, the data provider, and the data format. In some cases, parallel transfer is the default or possibly even the only mode of transfer. In others, it's optional or not possible, with serial transfer offered instead.

 

When there's a choice between serial and parallel data movement, then we can use specific SAS program directives. This article looks at three of them:

  1. DataTransferMode specified as a file import option
  2. DataTransferMode specified as a caslib attribute
  3. ParallelMode specified as a file import option

In the title and bulleted list above, the term DataTransferMode is listed twice. That's because there really are two of them with very different definitions - and they just happen to share the same name. Even more confusing, each kind of DataTransferMode takes the same three values: auto, serial, and parallel. And again, those values really do mean something different when specified as a file import option DataTransferMode versus as a caslib attribute DataTransferMode.

 

Confused definitions aside, understanding how exactly how to direct CAS to perform the desired kind of data transfer will allow us to unlock the scalable potential of CAS's ability to load data quickly and efficiently.

File import option: DataTransferMode

When CAS is loading a standard SAS data set (.SAS7BDAT file) using the PATH type of caslib, then it is possible to specify the DataTransferMode as an option of the ImportOptions argument of the Load statement of the procedure code (PROC CAS and/or PROC CASUTIL). When FileType="BASESAS", the default DataTransferMode="AUTO" which means to attempt parallel transfer first if possible, else fallback to serial transfer via the CAS Controller.

 

cas mysess1;
caslib mypath path="/path/to/data" sessref=mysess1
              datasource=(srcType='path');

proc casutil; 
   load casdata="baseball.sas7bdat" casout="baseball_bdat"
   importoptions=(filetype="basesas", dataTransferMode="auto") ;
run;

 

The PATH type of caslib defines the directory path at which all of the CAS hosts will look for the SAS data set. All CAS workers are required to participate in the parallel transfer. If any of the CAS workers cannot perform the transfer, then only serial is possible via the CAS Controller. Saving data from CAS back to the SAS data set is always a serial-only process.

 

The dataTransferMode file import option is not available for any other fileTypes (AUTO, CSV, SASHDAT, XLS/XLSX) accessed using the PATH type of caslib (regardless of whether auto, serial, or parallel is given). Those file types are always serial transfer via the CAS Controller. To perform a parallel load of SASHDAT or CSV files from shared disk, use the DNFS type of caslib instead.

Caslib attribute: DataTransferMode

CAS functionality can be extended to reach external data directly by licensing the SAS/ACCESS products with their associated SAS Data Connector technology. Beyond that, additional functionality can be provided by our SAS In-Database offerings with the associated SAS Data Connect Accelerator technology for CAS. We use the DataTransferMode attribute in association with select caslib SrcTypes to direct CAS to use either the Data Connector technology or the Data Connect Accelerator technology.

 

Some caslib SrcTypes have only Data Connector technology. For those, the DataTransferMode option cannot be specified. But if it were, it's value would effectively be "SERIAL".

 

A few caslib SrcTypes have both Data Connectors and Data Connect Accelerator as optional components. The DataTransferMode is used to specify which to use. The default value is "SERIAL" which means "use the SAS Data Connector for the specified SrcType to transfer data from the data provider". The optional value of "PARALLEL" means "use the SAS Data Connect Accelerator for the specified srcType to transfer data from the SAS In-Database Embedded Process in the data provider". And finally, the optional value of "AUTO" means "try PARALLEL if possible, else use SERIAL".

 

cas mysess1;
caslib cashive datasource=(srctype="hadoop",
                           server="hadoop.site.com",
                           dataTransferMode="auto",                   
                           username="hadoopuser",
                           hadoopconfigdir="/opt/sas/viya/config/data/hadoop/conf",                          
                           hadoopjarpath="/opt/sas/viya/config/data/hadoop/lib",
                           schema="hiveschema");

proc casutil ;
   load incaslib="cashive" casdata="endeavors" 
        outcaslib="cashive" casout="endeavors" replace ;
run ;

 

The DataTransferMode caslib attribute is not available for SrcTypes of DB2, HANA, Impala, MySQL, Oracle, Postgres, Redshift, Spark, SQLServer, Vertica, JDBC, ODBC, DNFS, HDFS, S3, or Path. It can only be specified for SrcTypes of Hadoop, Spark, SPDE, and Teradata.

 

It is very important to remember that the concept of multi-node transfer (which ideally is a form of parallel data movement) is a function of SAS Data Connectors. Because CAS is directed to use the Data Connector technology when the caslib attribute DataTransferMode="SERIAL", this can be confusing. Therefore, a Data Connector's multi-node transfer capability is enabled by using other caslib attributes: NumReadNodes and NumWriteNodes.

File import option: ParallelMode

CAS can also load data directly from the SAS LASR Analytic Server. Only when both CAS and LASR are deployed in MPP mode is parallel movement possible. Use the ParallelMode file import option to specify how data will move from LASR to CAS.

 

cas mysess1;
caslib mylasr sessref=mysess1 
              datasource=(srcType='lasr' 
                          server="lasr.site.com" port=10031 
                          username="lasruser" password="lasrpass"
                          signer="http:// lasr.site.com:80/SASLASRAuthorization" 
                          );
	
proc casutil; 
    load casdata="LIBREF.lyrics" casout="lyrics_lasr"
    importoptions=(filetype="lasr"  parallelmode="fallback")
   ;
run;

 

The default value for ParallelMode="FALLBACK" which directs CAS to attempt parallel if possible, else fallback to serial. The value "FORCE" will perform parallel-only transfer. And the value "NONE" will perform serial-only transfer via the CAS Controller.

 

CAS cannot save data back to LASR. It's a one-way street from LASR to CAS only.

That's not all

This post highlights just a few concepts describing CAS and data movement. Note that when there is only one supported option for data movement, then you'll typically find that it's not possible to specify something like DataTransferMode. We saw above for the Path type of caslib that CSV, SASHDAT, and XLS/XLSX files can only transfer in serial, and so the DataTransferMode option isn't allowed like it is for SAS data sets. The same is true for other situations where parallel transfer of data with CAS is the only possible option, specifically caslib srcTypes for DNFS and HDFS. And then the (Amazon) S3 type of caslib is a special case. For SASHDAT files in S3, only parallel transfer is offered. And for CSV files in S3, only serial transfer is offered. Neither is allowed to specify a DataTransferMode.

 

My SAS Global Forum 2019 paper, Seriously Serial or Perfectly Parallel Data Transfer in SAS Viya, covers this topic in much more detail with illustrations, additional code samples, log outputs, and techniques for determining whether CAS moved data in parallel or not.

Version history
Last update:
‎06-03-2019 03:27 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started