BookmarkSubscribeRSS Feed
Pallavi_Varsh
Calcite | Level 5

I am creating a sas dataset with around 85-90 columns in it and some of columns are of big size (2048) so the dataset is coming of very large size i.e. 1331622 observations are coming as of 14.9 GB size. I have a limitation that file size should be max of 4 GB so that it can pass through some transferring channel. Please let me know if there is any way to reduce the size of dataset without using COMPRESS option and the column length should be same as what available in database(means the column length and datatype should be same as that of database through which I am fetching data). .

10 REPLIES 10
EricHoogenboom
Fluorite | Level 6

If you are not allowed to compress (why?) then splitting your dataset is the only option I can think of.

* untested;

data part1 part2 part3;

   set have;

   if _N_ <= 400000 then output part1;

   else if _N_ <= 800000 then output part2;

   else output part3;

run;

* and glue them together at the other end of the line;

data want;

   set part1 part2 part3;

run;

Hth,

Eric

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, this really depends on various things.  You haven't shown an example of the data so am just guessing here but some things to consider:

Apply coding to you dataset.  For instance, those columns which have long text strings, are they similar strings?  If so then pull a distinct list out and then apply as a format.  That alone will drastically reduce your dataset.  Code any other possible variables, use formats, or related datasets, for instance, consider:

ID          VAR1          BP          HR          COMMENTS     AGE     SOMETHINGELSE     ANOTHERCOMMENT

...

Now the above could all go in one dataset, but is that the most efficient use of space?  Perhaps have a comments dataset separate from this and assign a linking variable.  Maybe BP and HR could be moved to a separate vitals signs dataset using ID?  You can also split the dataset as EricHoogenboom has said. 

There are other methods as well, why for instance, does the transfer only allow 4gb?  Doesn't seem to be a valid reason.  Are you sending by email, that in itself is a bad idea.  Use a controlled portal to transfer data.  Use a controlled web portal and stream CSV data across is another method (pro http will download a stream of CSV data for instance).

Also, have you tried Zipping the file, I.e. use a proper ZIP program (not windows compressed folder) and choose maximum compression.  Or try XPT from SAS. 

As you can see there are a variety of methods, both coding, and utility programs for such a task depending on your situation.

Pallavi_Varsh
Calcite | Level 5

Hi,

I am not using compress because compress option increases the number of cpu cycle for accessing the data and it can impact the performance of an application that is why I am looking for some alternative.

Thanks,

Pallavi

jakarman
Barite | Level 11

Newbie did you try the compressing and prove the results wuiht your expectations ? With a spinning disk a compress=binary you could effectively win a lot time just by all that IO overhead. I have seen an increase with a total record-size of 80bytes and having a SSD. that number of 1,3M record is not that big  you must have some 10Kbyte recordsize.   I am Expecting just performance gains.      

---->-- ja karman --<-----
SASKiwi
PROC Star

In our experience the slightly increased overhead in CPU is more than compensated by the big savings in IO, resulting in much faster run times.

Kurt_Bremser
Super User

As the others said, your additional CPU cycles will be offset by the increased I/O performance. Rule of thumb: with the exception of the STAT procedures, SAS is always I/O bound.

Here, all datasets that are decreased by at least 20% in size are stored with the compress option. Otherwise, we wouldn't be able to work at all with the current setup (storage and server).

SASKiwi
PROC Star

The COMPRESS option is the best way of dealing with your space issue. Many SAS sites now have it switched on by default, including us (COMPRESS = BINARY).

AhmedAl_Attar
Rhodochrosite | Level 12

Hi,

Try this Sample code from support.sas.com Sample 24804: %SQUEEZE-ing Before Compressing Data, Redux

Thanks,

Ahmed

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1499 views
  • 2 likes
  • 7 in conversation