Hi,
I want to export a table with billion + rows to my local from SAS EG.
I was wondering if there is anyway that the data can be zipped or compressed first and then the data is exported to local to reduce the I/O. Any suggestions?
Where are you processing this data now? On local SAS or on remote server SAS? Where is the SAS Table stored? And what to you mean by export - export to what file format or just copy as SAS table?
I am accessing data on remote SAS serer using SAS EG , the SAS Table is stored in a Sybase DB on server.
Export to csv or tab separated.
My end goal is to read the data in python program for applying machine learning
Valid questions by @SASKiwi.
Look at proc export.
It's quite old, so I am unsure how it compares with zipping in terms of file size.
It's very handy for whole libraries, but you may be better off just zipping if it's only one file and the sides have the same OS.
Hi @ChrisNZ
Thank you for the input
I checked the proc export. It does not provide any option to compress file before export. As the table is massive the plain export will take significantly longer and might get time out as well.
I am working in an Enterprise so the default security and login restrictions apply
>It does not provide any option to compress file before export.
It always compresses.
So the process is Sybase => CSV => Python script?
SAS has nothing to do with this process then.
The only value that can be added is transferring with SAS/ACCESS, but for that kind of volume and considering the end-result is CSV, it is probably a lot more efficient to ask the sybase admins to dump the table to a text file, zip it, transfer it and unzip it.
Hi @ChrisNZ
The access to Sybase db is though SAS. Unfortunately going directly to admins is not an possible option.
I was wondering/curious in case there is any functionality provided by sas to compress the table to zip or tarz(any other universal format) before export to local(disk) to reduce IO
Thank you!
You could use PROC EXPORT, but that doesn't provide many tuning options.
You could use PROC SQL, which works a little bit closer to the database. And if you have SAS 9.4 M5, you can GZIP the output "on the fly."
ods _all_ close;
filename out ZIP "/u/myaccount/project/table.csv.gz" GZIP;
ods csv file=out;
proc sql;
select * from sashelp.class;
quit;
ods csv close;
ODS CSV might not be the fastest at writing the output, but at least the final result should be compressed. It's worth testing with just a subset at first and see if it works for you.
@ChrisHemedinger Thank you! seems like a viable option 🙂
@ChrisHemedinger's solution is probably the best you can do.
Just point proc sql to your Sybase data and download.
The one change I would make is split the download.
You'll get disconnections or errors, you don't want to start over.
Just make your python script scan several files instead of one if you can.
You can split by a variable of your choice, hopefully a refresh or capture date of some sort
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.