SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Size of the zipped data set is different from its size before it was zipped

Reply
Contributor
Posts: 23

Size of the zipped data set is different from its size before it was zipped

We zipped a dataset using:

 

- call system("gzip path/data.sas7bdat")

 

However, when we unzipped it using:

 

- call system("gzip -d path/data.sas7bdat")

 

We noticed that its size was different from before it was zipped. Does this mean that something changed in the data set? We compared it from its back up data set using proc compare though but there is no discrepancy. However, their file sizes are different. May we know the reason behind this?

 

Super User
Posts: 10,594

Re: Size of the zipped data set is different from its size before it was zipped


@iSAS wrote:

We zipped a dataset using:

 

- call system("gzip path/data.sas7bdat")

 

However, when we unzipped it using:

 

- call system("gzip -d path/data.sas7bdat")

 

We noticed that its size was different from before it was zipped. Does this mean that something changed in the data set? We compared it from its back up data set using proc compare though but there is no discrepancy. However, their file sizes are different. May we know the reason behind this?

 


That wouldn't work. gzip creates

path/data.sas7bdat.gz

while

gzip -d path/data.sas7bdat

will find that the sas7bdat is not a gzipped file and would fail. Since the gzip removes the source file, there must be some process that recreated the sas7bdat. So I suggest you do a

mv path/data.sas7bdat path/data_new.sas7bdat
gzip -d path/data.sas7bdat.gz

and then do a visual compare of the size and the timestamps of data_new.sas7bdat and data.sas7bdat, as gzip preserves those during the whole process.

I've never had a change of physical filesize from gzip / gzip -d.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 23

Re: Size of the zipped data set is different from its size before it was zipped

Posted in reply to KurtBremser
My mistake, I mixed it up. We encountered this issue on on cport/cimport and not on gzip
Ask a Question
Discussion stats
  • 2 replies
  • 127 views
  • 1 like
  • 2 in conversation