Re: Push SAS table to LASR in a separate VA environment

Evgeny_ · Posted 06-20-2017 03:06 AM

Hi All,

We have two separate environment (both SAS 9.4). In one environment (on premise Linux) we have all our flows, DI jobs and therefore all warehouse tables. In another environment (multiple AWS instances) we just run our distributed VA (7.3). Currently, we zip up our warehouse tables, push them to VA server, unpack them and autoload to LASR. It involves a number of scripts to be scheduled, and also requires us to have enough space to drop the archive which raises our AWS cost.

We would like to write from DIS jobs directly to LASR however our VA environment does not have SASConnect. A suggestion was, to write the code defining LASR libname, and running the data step:

data lasr_lib.table_name; 
    set source_lib.table_name;
run;

That way did not work, as the table did not load to LASR (and it also has to be unloaded beforehand). However, the abovce piece of code work with (append=yes) option. Hence, I don't have unload the table but just purge all the records from LASR table and append fresh data. However, I am not sure how APPEND will perform for huge datafiles.

My question is, what is the best suggestion to push tables to VA LASR if it is a separate environment. May be push them to VA HADOOP first and then locally to LASR.

Thanks!

alexal · Posted 06-20-2017 03:27 AM

@Evgeny_,

You need to use the SAS LASR Analytic Server Access Tools. Beginning with the third maintenance release for SAS® 9.4, SAS® Integration Technologies includes the SAS LASR Analytic Server Access Tools. The SAS LASR Analytic Server Access Tools include two engines: the SASIOLA engine and the SASHDAT engine. These engines make it possible to copy data from an environment without a SAS LASR Analytic Server to a remote SAS LASR Analytic Server or Hadoop Distributed File System (HDFS).

SAS Usage Note 56996: Tips for using the SAS® LASR™ Analytic Server Access Tools

Let me know if you have any questions.

Evgeny_ · Posted 06-20-2017 04:13 AM

@alexal,

I guess SASHDAT might be a possible solution for me. However, when trying to run the libname statement, it throws an error:

ERROR: Failed to enumerate available compute nodes in the distributed computing environment. Make sure that the host and install location are specified properly and that you can make a connection via passwordless ssh to the host machine.

But if I run the same libname locally in VA - it works. Do I have open any specific port on VA side? Passwordless ssh is already established between two nodes and I can ssh and telnet to port 22.

Thanks!

alexal · Posted 06-20-2017 05:05 AM

@Evgeny_,

That's right, you have to configure passwordless SSH between these environments and specify username and SSH key in options with name TKSSH_USER & TKSSH_IDENTITY.

SASKiwi · Posted 06-20-2017 03:42 AM

If @alexal's suggestions won't work for you because you are on an earlier SAS 9.4 maintenance level, then we were able to negotiate with SAS to get a free limited SAS/CONNECT license to solve this problem. We are on SAS 9.4M2 connecting to SAS VA 7.3 (SAS 9.4M3).

SAS/CONNECT works brilliantly to enable end-to-end loading of VA from our primary SAS environment all in a single job.

ThomasPalm · Posted 06-20-2017 05:55 AM

One simple thing to try, is to use the compress option on your dataset from DI.

You don't have to uncompress before load.

Push SAS table to LASR in a separate VA environment