Export big data(Billion + rows) from SAS table to disk

Reply
Occasional Contributor
Posts: 8

Export big data(Billion + rows) from SAS table to disk

Hi,

 

I want to export a table with billion + rows to my local from SAS EG. 

I was wondering if there is anyway that the data can be zipped or compressed first and then the data is exported to local to reduce the I/O. Any suggestions?

 

 

Super User
Posts: 3,918

Re: Export big data(Billion + rows) from SAS table to disk

Where are you processing this data now? On local SAS or on remote server SAS? Where is the SAS Table stored? And what to you mean by export - export to what file format or just copy as SAS table?

Occasional Contributor
Posts: 8

Re: Export big data(Billion + rows) from SAS table to disk

I am accessing data on remote SAS serer using SAS EG , the SAS Table is stored in a Sybase DB on server.

Export to csv or tab separated.

My end goal is to read the data in python program for applying machine learning

PROC Star
Posts: 2,348

Re: Export big data(Billion + rows) from SAS table to disk

Valid questions by @SASKiwi.

Look at proc export.

It's quite old, so I am unsure how it compares with zipping in terms of file size.

It's very handy for whole libraries, but you may be better off just zipping if it's only one file and the sides have the same OS.

 

Occasional Contributor
Posts: 8

Re: Export big data(Billion + rows) from SAS table to disk

Hi @ChrisNZ

 

Thank you for the input

I checked the proc export. It does not provide any option to compress file before export. As the table is massive the plain export will take significantly longer and might get time out as well.

I am working in an Enterprise so the default security and login restrictions apply

PROC Star
Posts: 2,348

Re: Export big data(Billion + rows) from SAS table to disk

 >It does not provide any option to compress file before export. 

It always compresses.

 

So the process is Sybase => CSV => Python script?

SAS has nothing to do with this process then.

The only value that can be added is transferring with SAS/ACCESS, but for that kind of volume and considering the end-result is CSV, it is probably a lot more efficient to ask the sybase admins to dump the table to a text file, zip it, transfer it  and unzip it.

 

Occasional Contributor
Posts: 8

Re: Export big data(Billion + rows) from SAS table to disk

Hi @ChrisNZ

 

The access to Sybase db is though SAS. Unfortunately going directly to admins is not an possible option.

 

I was wondering/curious in case there is any functionality provided by sas to compress the table to zip or tarz(any other universal format)  before export to local(disk) to reduce IO

 

Thank you!

 

Community Manager
Posts: 3,424

Re: Export big data(Billion + rows) from SAS table to disk

You could use PROC EXPORT, but that doesn't provide many tuning options.

 

You could use PROC SQL, which works a little bit closer to the database.  And if you have SAS 9.4 M5, you can GZIP the output "on the fly."

 

ods _all_ close;
filename out ZIP "/u/myaccount/project/table.csv.gz" GZIP;
ods csv file=out;
proc sql;
 select * from sashelp.class;
quit;
ods csv close;

ODS CSV might not be the fastest at writing the output, but at least the final result should be compressed.  It's worth testing with just a subset at first and see if it works for you.

 

Occasional Contributor
Posts: 8

Re: Export big data(Billion + rows) from SAS table to disk

Posted in reply to ChrisHemedinger

@ChrisHemedinger Thank you! seems like a viable option Smiley Happy

PROC Star
Posts: 2,348

Re: Export big data(Billion + rows) from SAS table to disk

@ChrisHemedinger's solution is probably the best you can do.

Just point proc sql to your Sybase data and download.

 

The one change I would make is split the download.

You'll get disconnections or errors, you don't want to start over.

Just make your python script scan several files instead of one if you can.

 

You can split by a variable of your choice, hopefully a refresh or capture date of some sort

Ask a Question
Discussion stats
  • 9 replies
  • 182 views
  • 4 likes
  • 4 in conversation