DATA Step, Macro, Functions and more

File becomes bigger after PROC CPORT

Reply
Contributor
Posts: 21

File becomes bigger after PROC CPORT

I tried PROC CPORT on my data set expecting that the file size will become smaller but it turned out to be the opposite. The size of the file, in .xpt format, become bigger. May I know the reason behind this, is this issue common?

Super User
Posts: 3,926

Re: File becomes bigger after PROC CPORT

[ Edited ]

CPORT files are standardised so they can be easily exchanged between computers with different operating systems and / or different SAS versions. They are not optimised to reduce storage. If the SAS datasets you CPORT are originally compressed, then I'm not surprised they would end up bigger.

 

Also CPORT files have a fixed blocksize so yet another reason they could be bigger:

 

Verifying That the Communications Software Has Not Changed File Attributes

Verify that your communications software does not change file attributes. Here are the required attributes with values:
Logical record length (LRECL)
80 or an integer that is a multiple of 80 (for example, 160, 240, 320)
Block size (BLKSIZE)
8000 blocks
Record format (RECFM)
Fixed block
Super User
Super User
Posts: 9,599

Re: File becomes bigger after PROC CPORT

What is it your actually trying to achieve here, in terms of this post and your other one on Gzip?  Transport files are just that, a method of transporting a file from one operating system to another, they are not compressed files.  What do you need compressed files for?  What is the process you are trying to achieve?

Super User
Posts: 10,280

Re: File becomes bigger after PROC CPORT

The xpt format is used for transport of data between operating systems and SAS versions, it is not meant to be used for reducing storage.

If you need a radical reduction in used space, use gzip or similar.

Reasoning:

  • both methods require one step to make the file readable for SAS again (proc cimport, or gzip -d)
  • the compression achieved with gzip is much better than that of internal SAS means (compress=yes, compress=binary)
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 21

Re: File becomes bigger after PROC CPORT

[ Edited ]

Thanks to all who responded to my questions.

 

 

Our current situation/issue is we will soon be migrating from SAS 9.3 to 9.4 and we need to compress our data sets as much as we could since we're running out of space on our server. We looked for ways on how to compress the files and decided to combine proc cport and gzip since we tested one data set  on this and its size were compressed from 65GB to just 212MB. However, upon testing this to other data sets, we notice inconsistencies like after using proc cimport and gzip -d the size of the data set isn't the same prior to compressing. This concerned us since of thinking that the data set might not be fully equal with the original one. So for now, we're just using gzip since it never affected the size of dataset even after unzipping.

Super User
Posts: 3,926

Re: File becomes bigger after PROC CPORT

You can use PROC COMPARE to ensure that your migrated data is exactly the same as your original data. Create a file share to your 9.3 data from your new SAS server so you can easily compare tables. If you are changing operating systems that might make it a bit harder to compare. Please advise us what operating systems both of your SAS servers run on.

Contributor
Posts: 21

Re: File becomes bigger after PROC CPORT

We used proc compare after cimport and there is no difference from the original table. However, they have different file size. I think one of the reason it has difference on size is because compress option was done when the original table was created. When we cport and cimport the table the file size went larger but there is no issue on proc compare. So it's still safe to conclude that the cimported data is exactly the same from the original data?

Super User
Posts: 10,280

Re: File becomes bigger after PROC CPORT

When the report from proc compare finds no differences, you're good. But that works on the logical (content) side of things. Physical file parameters are not checked, and later SAS versions tend to use a larger pagesize, which results in different file layouts and sizes.

If the original datasets were compressed, proc cport/cimport should honor that and compress the imported datasets.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 10,280

Re: File becomes bigger after PROC CPORT

I would strongly advise to do a direct jump to the current SAS version (9.4M5). Otherwise you'll have to do all the work again RSN. There are so many important additions in 9.4 that it makes the intermediate step to 9.3 just a waste of time.

We did 9.2 to 9.4 recently, and since both servers run AIX, we just had to copy the files as they are.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 9,599

Re: File becomes bigger after PROC CPORT

Posted in reply to KurtBremser

My thoughts exactly, no reason to migrate to anything other than 9.4 at this moment in time.  And yes, only the 32bit v 64bit is the real killer but that should only affect catalogs (which are bad bad bad!). 

Also, if your server is running out of space, consider upgrading it before migration.  You can buy thousands of TB now so cheap it is untrue.

I assume you have the whole thing planned out, md5 hash checks on base files versus moved, backed up original, proc compares, code testing etc?

Super User
Posts: 10,280

Re: File becomes bigger after PROC CPORT

And some more thoughts regarding space issues:

  • use the compress options on all datasets, and look at the log to see if it has a reasonably positive effect (3% is not worth it)
  • expanding disk storage is not such an issue, given todays price/TB, even with high-end devices (SSD over FC)
  • for the sake of manageability, you should reduce the number of datasets present for online processing to what's really needed. Once you have the need to balance long term keeping vs storage availability, implement a suitable archival system. With us, this is handled by TSM.

And finally, I advise to not migrate in place on a given server. Set up a new one with the current SAS version, copy/migrate your data (depending on involved operating system(s), bitness or encoding issues), get it up and running, and once you've verified it's OK, switch your users over and decommission the old one. It's the most painless method for you and your users. As long as you run production on only one server, SAS (in my experience) won't object to having two installations in parallel for a limited time.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 21

Re: File becomes bigger after PROC CPORT

My mistake guys sorry. We're migrating from 9.3 to 9.4

Super User
Posts: 10,280

Re: File becomes bigger after PROC CPORT


@iSAS wrote:

My mistake guys sorry. We're migrating from 9.3 to 9.4


Now that's good news Smiley Wink

 

My other thoughts still apply, of course.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 12 replies
  • 136 views
  • 5 likes
  • 4 in conversation