Exploring, modeling, predicting and reporting with SAS Visual Analytics and SAS Visual Statistics

Copying data into HDFS from LASR server

Accepted Solution Solved
Reply
Contributor
Posts: 31
Accepted Solution

Copying data into HDFS from LASR server

Hi,

I am trying to Copy data from LASR server to HDFS as a back up. Data in LASR is a compressed dataset with  around 25 Million records.

I am using "Proc Oliphant" to this task, but it is taking lot of time approx 1.5 hours.

proc oliphant host = "&host"

install = "&tkgridinstall";

add &secdat. path="&hdfspath" replace;

run;

Is there any other efficient way to do this task?

Thanks,

Nikhil


Accepted Solutions
Solution
‎06-08-2015 04:02 PM
SAS Employee
Posts: 11

Re: Copying data into HDFS from LASR server

REPLACE enables you to overwrite the file in HDFS.  COPIES= specifies how many redundant copies to make of the file.

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,068

Re: Copying data into HDFS from LASR server

PROC LASR?

Data never sleeps
Contributor
Posts: 31

Re: Copying data into HDFS from LASR server

Hi Linus,

Thanks for the response. I tried Proc Imstat, but that is also taking long time to execute. I havent tried Proc LASR as it is  somehow not executing at our site. Need to work with SAS admin to get it resolved...Smiley Happy

Thanks,

Nikhil

SAS Employee
Posts: 340

Re: Copying data into HDFS from LASR server

Hi,

what is &secdat in your example? Is it a table sored on LASR? Are you using the SASIOLA libname engine?

PROC OLIPHANT can load a table from SAS server (or a table accessible from SAS server) to SASHDAT. This is not what you need. You want to save an in-memory LASR table (SASIOLA) to disk (SASHDAT). Use IMSTAT instead.


Can you share the IMSTAT code you have used? Also the libname statement that creates the libraries.

What is the version of SAS you are using? What is the size (in Gbytes) of the table? How many nodes and memory do you have?

Contributor
Posts: 31

Re: Copying data into HDFS from LASR server

Hi Gergeley,

Thanks for yopur response.

A) &secdat is the LASR table as you guessed and I am using SASIOLA engine

B) Below is the IMSTAT code that I had used

C) We are using SAS version 9.4,

      Dataset that I am trying to back up is an compressed LASR table with 2.59 GB

      In LASR we have 3 nodes and Memory available is 1.5 TB i.e. 500 GB per node

Code used:

LIBNAME VALIBLA SASIOLA  TAG=HPS  PORT=&port HOST="&host"  SIGNER="&signer" ;

proc imstat data=VALIBLA.&dat;

Where account_number is not null;

save path="/hps" copies=1;

run;

Are we missing anything in above code, as even

1) even when I tried to save a small datasets of 2 MB, its did execute for 20 minutes without completion and I had to kill it.

2) Can we execute above code without "Where" clause?

Thanks,

Nikhil

Thanks,

Nikhil

SAS Employee
Posts: 340

Re: Copying data into HDFS from LASR server

Your data size and HW sizing seems to be OK.

2) Yes, you can use it without the where statement.  If you try it: can you save the table that way?

Contributor
Posts: 31

Re: Copying data into HDFS from LASR server

Hi Gergely,

Thanks for you response. I was facing issues as couple of Nodes on HFDS were down.

It is working now, and yes we can save the table. Instead of "Copies" I used "Replace" Option

Nikhil

Solution
‎06-08-2015 04:02 PM
SAS Employee
Posts: 11

Re: Copying data into HDFS from LASR server

REPLACE enables you to overwrite the file in HDFS.  COPIES= specifies how many redundant copies to make of the file.

Contributor
Posts: 31

Re: Copying data into HDFS from LASR server

Hi Mike,

Thanks for the response. Since yesterday, I am trying to Post a new question but getting below error.

Not allowed to post content more than once every 60 seconds

Hence posting the question in the same thread. Not sure if this a right way to go about it.  Any insight would be appreciated

I have a question on the LASR server Join. We have copied both the tables in LASR server and doing SQL JOIN  using SAS DI join transformation.

My question is, in this scenario


A) Would Join processing happen in LASR server?

B) If Join happens in Workspace server, then would it copy both tables from LASR to workspace and perform JOIN there?

    So would it occupy both Disc space and memory OR only memory?


Thanks,

Nikhil

SAS Employee
Posts: 340

Re: Copying data into HDFS from LASR server

If you use PROC SQL:

A) No.

B) Yes, on the Workspace Server. In this case LASR would act just as a simple data provider.

Instead use PROC IMSTAT SCHEMA statement.

Or PROC IMSTAT SCORE stement with a hash object.

Or if you store tables in Hive/Impala use the hadoop/impala access engines with PROC SQL.

Use native Haadoop tools.

Contributor
Posts: 31

Re: Copying data into HDFS from LASR server

Hi Gergely,

We did a small exercise of Executing the ETL script and monitoring the Work area and Resource Utilization while ETL is executing.

In ETL script, we are copying data from SAS to LASR and doing Join using SAS DI transformation.

While doing this We monitored SASWORK area space to check if the usage goes up during the execution, and we did not see any difference in the available space before and After execution. Is there something that we missed to capture during this exercise?

Nikhil

SAS Employee
Posts: 340

Re: Copying data into HDFS from LASR server

Hi,

Of course there will be no difference in available disk space before and after the execution, because SAS cleans up after executing (joining) and uploading the results.

But you also write, you were monitoring while ETL is executing.

Also it might be possible, that you have used  small datasets, and everything happened in the memory of the SAS Workspace Server.

Some options that can help monitoring:

options sastrace=',,,d' sastraceloc=saslog;

options fullstimer;

options msglevel=i;

Also check the UTILLOC option. Sometimes it points to a different location than WORK.

Could you attach the code that was generated by DI Studio?

Post a Question
Discussion Stats
  • 11 replies
  • 822 views
  • 8 likes
  • 4 in conversation