06-01-2015 04:24 AM
I am trying to Copy data from LASR server to HDFS as a back up. Data in LASR is a compressed dataset with around 25 Million records.
I am using "Proc Oliphant" to this task, but it is taking lot of time approx 1.5 hours.
proc oliphant host = "&host"
install = "&tkgridinstall";
add &secdat. path="&hdfspath" replace;
Is there any other efficient way to do this task?
06-01-2015 06:39 AM
Thanks for the response. I tried Proc Imstat, but that is also taking long time to execute. I havent tried Proc LASR as it is somehow not executing at our site. Need to work with SAS admin to get it resolved...
06-01-2015 09:17 AM
what is &secdat in your example? Is it a table sored on LASR? Are you using the SASIOLA libname engine?
PROC OLIPHANT can load a table from SAS server (or a table accessible from SAS server) to SASHDAT. This is not what you need. You want to save an in-memory LASR table (SASIOLA) to disk (SASHDAT). Use IMSTAT instead.
Can you share the IMSTAT code you have used? Also the libname statement that creates the libraries.
What is the version of SAS you are using? What is the size (in Gbytes) of the table? How many nodes and memory do you have?
06-02-2015 02:45 AM
Thanks for yopur response.
A) &secdat is the LASR table as you guessed and I am using SASIOLA engine
B) Below is the IMSTAT code that I had used
C) We are using SAS version 9.4,
Dataset that I am trying to back up is an compressed LASR table with 2.59 GB
In LASR we have 3 nodes and Memory available is 1.5 TB i.e. 500 GB per node
LIBNAME VALIBLA SASIOLA TAG=HPS PORT=&port HOST="&host" SIGNER="&signer" ;
proc imstat data=VALIBLA.&dat;
Where account_number is not null;
save path="/hps" copies=1;
Are we missing anything in above code, as even
1) even when I tried to save a small datasets of 2 MB, its did execute for 20 minutes without completion and I had to kill it.
2) Can we execute above code without "Where" clause?
06-02-2015 11:44 AM
Your data size and HW sizing seems to be OK.
2) Yes, you can use it without the where statement. If you try it: can you save the table that way?
06-08-2015 06:13 AM
Thanks for you response. I was facing issues as couple of Nodes on HFDS were down.
It is working now, and yes we can save the table. Instead of "Copies" I used "Replace" Option
06-09-2015 02:16 AM
Thanks for the response. Since yesterday, I am trying to Post a new question but getting below error.
Not allowed to post content more than once every 60 seconds
Hence posting the question in the same thread. Not sure if this a right way to go about it. Any insight would be appreciated
I have a question on the LASR server Join. We have copied both the tables in LASR server and doing SQL JOIN using SAS DI join transformation.
My question is, in this scenario
A) Would Join processing happen in LASR server?
B) If Join happens in Workspace server, then would it copy both tables from LASR to workspace and perform JOIN there?
So would it occupy both Disc space and memory OR only memory?
06-09-2015 03:42 AM
If you use PROC SQL:
B) Yes, on the Workspace Server. In this case LASR would act just as a simple data provider.
Instead use PROC IMSTAT SCHEMA statement.
Or PROC IMSTAT SCORE stement with a hash object.
Or if you store tables in Hive/Impala use the hadoop/impala access engines with PROC SQL.
Use native Haadoop tools.
06-10-2015 02:22 AM
We did a small exercise of Executing the ETL script and monitoring the Work area and Resource Utilization while ETL is executing.
In ETL script, we are copying data from SAS to LASR and doing Join using SAS DI transformation.
While doing this We monitored SASWORK area space to check if the usage goes up during the execution, and we did not see any difference in the available space before and After execution. Is there something that we missed to capture during this exercise?
06-10-2015 03:46 AM
Of course there will be no difference in available disk space before and after the execution, because SAS cleans up after executing (joining) and uploading the results.
But you also write, you were monitoring while ETL is executing.
Also it might be possible, that you have used small datasets, and everything happened in the memory of the SAS Workspace Server.
Some options that can help monitoring:
options sastrace=',,,d' sastraceloc=saslog;
Also check the UTILLOC option. Sometimes it points to a different location than WORK.
Could you attach the code that was generated by DI Studio?
Need further help from the community? Please ask a new question.