I tried PROC CPORT on my data set expecting that the file size will become smaller but it turned out to be the opposite. The size of the file, in .xpt format, become bigger. May I know the reason behind this, is this issue common?
CPORT files are standardised so they can be easily exchanged between computers with different operating systems and / or different SAS versions. They are not optimised to reduce storage. If the SAS datasets you CPORT are originally compressed, then I'm not surprised they would end up bigger.
Also CPORT files have a fixed blocksize so yet another reason they could be bigger:
Logical record length (LRECL)
|
80 or an integer that is a multiple of 80 (for example, 160, 240, 320)
|
Block size (BLKSIZE)
|
8000 blocks
|
Record format (RECFM)
|
Fixed block
|
What is it your actually trying to achieve here, in terms of this post and your other one on Gzip? Transport files are just that, a method of transporting a file from one operating system to another, they are not compressed files. What do you need compressed files for? What is the process you are trying to achieve?
The xpt format is used for transport of data between operating systems and SAS versions, it is not meant to be used for reducing storage.
If you need a radical reduction in used space, use gzip or similar.
Reasoning:
Thanks to all who responded to my questions.
Our current situation/issue is we will soon be migrating from SAS 9.3 to 9.4 and we need to compress our data sets as much as we could since we're running out of space on our server. We looked for ways on how to compress the files and decided to combine proc cport and gzip since we tested one data set on this and its size were compressed from 65GB to just 212MB. However, upon testing this to other data sets, we notice inconsistencies like after using proc cimport and gzip -d the size of the data set isn't the same prior to compressing. This concerned us since of thinking that the data set might not be fully equal with the original one. So for now, we're just using gzip since it never affected the size of dataset even after unzipping.
You can use PROC COMPARE to ensure that your migrated data is exactly the same as your original data. Create a file share to your 9.3 data from your new SAS server so you can easily compare tables. If you are changing operating systems that might make it a bit harder to compare. Please advise us what operating systems both of your SAS servers run on.
We used proc compare after cimport and there is no difference from the original table. However, they have different file size. I think one of the reason it has difference on size is because compress option was done when the original table was created. When we cport and cimport the table the file size went larger but there is no issue on proc compare. So it's still safe to conclude that the cimported data is exactly the same from the original data?
When the report from proc compare finds no differences, you're good. But that works on the logical (content) side of things. Physical file parameters are not checked, and later SAS versions tend to use a larger pagesize, which results in different file layouts and sizes.
If the original datasets were compressed, proc cport/cimport should honor that and compress the imported datasets.
I would strongly advise to do a direct jump to the current SAS version (9.4M5). Otherwise you'll have to do all the work again RSN. There are so many important additions in 9.4 that it makes the intermediate step to 9.3 just a waste of time.
We did 9.2 to 9.4 recently, and since both servers run AIX, we just had to copy the files as they are.
My thoughts exactly, no reason to migrate to anything other than 9.4 at this moment in time. And yes, only the 32bit v 64bit is the real killer but that should only affect catalogs (which are bad bad bad!).
Also, if your server is running out of space, consider upgrading it before migration. You can buy thousands of TB now so cheap it is untrue.
I assume you have the whole thing planned out, md5 hash checks on base files versus moved, backed up original, proc compares, code testing etc?
And some more thoughts regarding space issues:
And finally, I advise to not migrate in place on a given server. Set up a new one with the current SAS version, copy/migrate your data (depending on involved operating system(s), bitness or encoding issues), get it up and running, and once you've verified it's OK, switch your users over and decommission the old one. It's the most painless method for you and your users. As long as you run production on only one server, SAS (in my experience) won't object to having two installations in parallel for a limited time.
My mistake guys sorry. We're migrating from 9.3 to 9.4
@iSAS wrote:
My mistake guys sorry. We're migrating from 9.3 to 9.4
Now that's good news 😉
My other thoughts still apply, of course.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.