I am creating a sas dataset with around 85-90 columns in it and some of columns are of big size (2048) so the dataset is coming of very large size i.e. 1331622 observations are coming as of 14.9 GB size. I have a limitation that file size should be max of 4 GB so that it can pass through some transferring channel. Please let me know if there is any way to reduce the size of dataset without using COMPRESS option and the column length should be same as what available in database(means the column length and datatype should be same as that of database through which I am fetching data). .
If you are not allowed to compress (why?) then splitting your dataset is the only option I can think of.
* untested;
data part1 part2 part3;
set have;
if _N_ <= 400000 then output part1;
else if _N_ <= 800000 then output part2;
else output part3;
run;
* and glue them together at the other end of the line;
data want;
set part1 part2 part3;
run;
Hth,
Eric
Well, this really depends on various things. You haven't shown an example of the data so am just guessing here but some things to consider:
Apply coding to you dataset. For instance, those columns which have long text strings, are they similar strings? If so then pull a distinct list out and then apply as a format. That alone will drastically reduce your dataset. Code any other possible variables, use formats, or related datasets, for instance, consider:
ID VAR1 BP HR COMMENTS AGE SOMETHINGELSE ANOTHERCOMMENT
...
Now the above could all go in one dataset, but is that the most efficient use of space? Perhaps have a comments dataset separate from this and assign a linking variable. Maybe BP and HR could be moved to a separate vitals signs dataset using ID? You can also split the dataset as EricHoogenboom has said.
There are other methods as well, why for instance, does the transfer only allow 4gb? Doesn't seem to be a valid reason. Are you sending by email, that in itself is a bad idea. Use a controlled portal to transfer data. Use a controlled web portal and stream CSV data across is another method (pro http will download a stream of CSV data for instance).
Also, have you tried Zipping the file, I.e. use a proper ZIP program (not windows compressed folder) and choose maximum compression. Or try XPT from SAS.
As you can see there are a variety of methods, both coding, and utility programs for such a task depending on your situation.
What keeps you from using the COMPRESS option? It is the big helper for exactly the situation you're in.
Hi,
I am not using compress because compress option increases the number of cpu cycle for accessing the data and it can impact the performance of an application that is why I am looking for some alternative.
Thanks,
Pallavi
 Pallavi_Varsh did you try the compressing and prove the results wuiht your expectations ? With a spinning disk a compress=binary you could effectively win a lot time just by all that IO overhead. I have seen an increase with a total record-size of 80bytes and having a SSD. that number of 1,3M record is not that big  you must have some 10Kbyte recordsize.   I am Expecting just performance gains.
 Pallavi_Varsh did you try the compressing and prove the results wuiht your expectations ? With a spinning disk a compress=binary you could effectively win a lot time just by all that IO overhead. I have seen an increase with a total record-size of 80bytes and having a SSD. that number of 1,3M record is not that big  you must have some 10Kbyte recordsize.   I am Expecting just performance gains.       
In our experience the slightly increased overhead in CPU is more than compensated by the big savings in IO, resulting in much faster run times.
As the others said, your additional CPU cycles will be offset by the increased I/O performance. Rule of thumb: with the exception of the STAT procedures, SAS is always I/O bound.
Here, all datasets that are decreased by at least 20% in size are stored with the compress option. Otherwise, we wouldn't be able to work at all with the current setup (storage and server).
Let me guess: you never ran a comparison test, did you?
The COMPRESS option is the best way of dealing with your space issue. Many SAS sites now have it switched on by default, including us (COMPRESS = BINARY).
Hi,
Try this Sample code from support.sas.com Sample 24804: %SQUEEZE-ing Before Compressing Data, Redux
Thanks,
Ahmed
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
