BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi,

I am trying to Copy data from LASR server to HDFS as a back up. Data in LASR is a compressed dataset with  around 25 Million records.

I am using "Proc Oliphant" to this task, but it is taking lot of time approx 1.5 hours.

proc oliphant host = "&host"

install = "&tkgridinstall";

add &secdat. path="&hdfspath" replace;

run;

Is there any other efficient way to do this task?

Thanks,

Nikhil

1 ACCEPTED SOLUTION

Accepted Solutions
MikeMcKiernan
SAS Employee

REPLACE enables you to overwrite the file in HDFS.  COPIES= specifies how many redundant copies to make of the file.

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

PROC LASR?

Data never sleeps
nikhil_khanolkar
Calcite | Level 5

Hi Linus,

Thanks for the response. I tried Proc Imstat, but that is also taking long time to execute. I havent tried Proc LASR as it is  somehow not executing at our site. Need to work with SAS admin to get it resolved...:)

Thanks,

Nikhil

gergely_batho
SAS Employee

Hi,

what is &secdat in your example? Is it a table sored on LASR? Are you using the SASIOLA libname engine?

PROC OLIPHANT can load a table from SAS server (or a table accessible from SAS server) to SASHDAT. This is not what you need. You want to save an in-memory LASR table (SASIOLA) to disk (SASHDAT). Use IMSTAT instead.


Can you share the IMSTAT code you have used? Also the libname statement that creates the libraries.

What is the version of SAS you are using? What is the size (in Gbytes) of the table? How many nodes and memory do you have?

nikhil_khanolkar
Calcite | Level 5

Hi Gergeley,

Thanks for yopur response.

A) &secdat is the LASR table as you guessed and I am using SASIOLA engine

B) Below is the IMSTAT code that I had used

C) We are using SAS version 9.4,

      Dataset that I am trying to back up is an compressed LASR table with 2.59 GB

      In LASR we have 3 nodes and Memory available is 1.5 TB i.e. 500 GB per node

Code used:

LIBNAME VALIBLA SASIOLA  TAG=HPS  PORT=&port HOST="&host"  SIGNER="&signer" ;

proc imstat data=VALIBLA.&dat;

Where account_number is not null;

save path="/hps" copies=1;

run;

Are we missing anything in above code, as even

1) even when I tried to save a small datasets of 2 MB, its did execute for 20 minutes without completion and I had to kill it.

2) Can we execute above code without "Where" clause?

Thanks,

Nikhil

Thanks,

Nikhil

gergely_batho
SAS Employee

Your data size and HW sizing seems to be OK.

2) Yes, you can use it without the where statement.  If you try it: can you save the table that way?

nikhil_khanolkar
Calcite | Level 5

Hi Gergely,

Thanks for you response. I was facing issues as couple of Nodes on HFDS were down.

It is working now, and yes we can save the table. Instead of "Copies" I used "Replace" Option

Nikhil

MikeMcKiernan
SAS Employee

REPLACE enables you to overwrite the file in HDFS.  COPIES= specifies how many redundant copies to make of the file.

nikhil_khanolkar
Calcite | Level 5

Hi Mike,

Thanks for the response. Since yesterday, I am trying to Post a new question but getting below error.

Not allowed to post content more than once every 60 seconds

Hence posting the question in the same thread. Not sure if this a right way to go about it.  Any insight would be appreciated

I have a question on the LASR server Join. We have copied both the tables in LASR server and doing SQL JOIN  using SAS DI join transformation.

My question is, in this scenario


A) Would Join processing happen in LASR server?

B) If Join happens in Workspace server, then would it copy both tables from LASR to workspace and perform JOIN there?

    So would it occupy both Disc space and memory OR only memory?


Thanks,

Nikhil

gergely_batho
SAS Employee

If you use PROC SQL:

A) No.

B) Yes, on the Workspace Server. In this case LASR would act just as a simple data provider.

Instead use PROC IMSTAT SCHEMA statement.

Or PROC IMSTAT SCORE stement with a hash object.

Or if you store tables in Hive/Impala use the hadoop/impala access engines with PROC SQL.

Use native Haadoop tools.

nikhil_khanolkar
Calcite | Level 5

Hi Gergely,

We did a small exercise of Executing the ETL script and monitoring the Work area and Resource Utilization while ETL is executing.

In ETL script, we are copying data from SAS to LASR and doing Join using SAS DI transformation.

While doing this We monitored SASWORK area space to check if the usage goes up during the execution, and we did not see any difference in the available space before and After execution. Is there something that we missed to capture during this exercise?

Nikhil

gergely_batho
SAS Employee

Hi,

Of course there will be no difference in available disk space before and after the execution, because SAS cleans up after executing (joining) and uploading the results.

But you also write, you were monitoring while ETL is executing.

Also it might be possible, that you have used  small datasets, and everything happened in the memory of the SAS Workspace Server.

Some options that can help monitoring:

options sastrace=',,,d' sastraceloc=saslog;

options fullstimer;

options msglevel=i;

Also check the UTILLOC option. Sometimes it points to a different location than WORK.

Could you attach the code that was generated by DI Studio?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 3342 views
  • 8 likes
  • 4 in conversation