05-02-2018 05:01 PM
I want to export a table with billion + rows to my local from SAS EG.
I was wondering if there is anyway that the data can be zipped or compressed first and then the data is exported to local to reduce the I/O. Any suggestions?
05-02-2018 06:14 PM
Where are you processing this data now? On local SAS or on remote server SAS? Where is the SAS Table stored? And what to you mean by export - export to what file format or just copy as SAS table?
05-03-2018 09:22 AM
I am accessing data on remote SAS serer using SAS EG , the SAS Table is stored in a Sybase DB on server.
Export to csv or tab separated.
My end goal is to read the data in python program for applying machine learning
05-03-2018 06:35 AM
05-03-2018 09:33 AM
Thank you for the input
I checked the proc export. It does not provide any option to compress file before export. As the table is massive the plain export will take significantly longer and might get time out as well.
I am working in an Enterprise so the default security and login restrictions apply
05-03-2018 05:09 PM
>It does not provide any option to compress file before export.
It always compresses.
So the process is Sybase => CSV => Python script?
SAS has nothing to do with this process then.
The only value that can be added is transferring with SAS/ACCESS, but for that kind of volume and considering the end-result is CSV, it is probably a lot more efficient to ask the sybase admins to dump the table to a text file, zip it, transfer it and unzip it.
05-03-2018 05:15 PM
The access to Sybase db is though SAS. Unfortunately going directly to admins is not an possible option.
I was wondering/curious in case there is any functionality provided by sas to compress the table to zip or tarz(any other universal format) before export to local(disk) to reduce IO
05-03-2018 05:41 PM
You could use PROC EXPORT, but that doesn't provide many tuning options.
You could use PROC SQL, which works a little bit closer to the database. And if you have SAS 9.4 M5, you can GZIP the output "on the fly."
ods _all_ close; filename out ZIP "/u/myaccount/project/table.csv.gz" GZIP; ods csv file=out; proc sql; select * from sashelp.class; quit; ods csv close;
ODS CSV might not be the fastest at writing the output, but at least the final result should be compressed. It's worth testing with just a subset at first and see if it works for you.
05-03-2018 06:41 PM
@ChrisHemedinger's solution is probably the best you can do.
Just point proc sql to your Sybase data and download.
The one change I would make is split the download.
You'll get disconnections or errors, you don't want to start over.
Just make your python script scan several files instead of one if you can.
You can split by a variable of your choice, hopefully a refresh or capture date of some sort