Size differences of exported file form SAS

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Size differences of exported file form SAS

Hi, All

 

I made two different file types (xlsx and txt : csv) by using the option, export from a SAS project.


I believe that the size of a text file is smaller than excel file only except large dataset over a million rows (becuase excel file can store only 2x2^20 - around a million rows).


However, when I export data from SAS with these two types of file by using a same data, I found that some dataset saved as excel(.xlsx) is much smaller than text type.


Is there any automatically applied compression process when the dataset is exported as xlsx from SAS?


I tried that matching up data types (numerical and char fields) of columns and other possible options, but I still do not get a clear explaination on that.


If you have any suspicous part and possible ideas on this, please drop a hint. Smiley Happy

I really appreciate any help from you all. 

 


Accepted Solutions
Solution
Thursday
Super User
Posts: 9,880

Re: Size differences of exported file form SAS

The modern Office files (since Microsoft finally saw the light and followed the example set by OpenOffice) are groups of xml files contained in a single .zip archive. XML is text with tags, so the files would initially be MUCH larger than ordinary text files, but the zip compression takes care of that very nicely.

How much compression rate you get is dependent on the contents, so there is no clear cut answer.

 

But note that text files are inherently better for any kind of data transfer. A text file is a text file is a text file, eg the csv format has not changed for decades, and will not change in the foreseeable future. You don't need a special version of some special software to read it, any text editor is sufficient. The same is demonstrably not true for Office files.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

View solution in original post


All Replies
PROC Star
Posts: 2,312

Re: Size differences of exported file form SAS

xlsx files are actually zip files.

 

Rename your xlsx file to a zip extension and open it with your decompressor of choice to verify this fact.

New Contributor
Posts: 4

Re: Size differences of exported file form SAS

Thank you for your tip. I followed that your posting and I figured out the anatomy of excel files.

Highlighted
Super User
Posts: 23,259

Re: Size differences of exported file form SAS


I believe that the size of a text file is smaller than excel file only except large dataset over a million rows (becuase excel file can store only 2x2^20 - around a million rows).

 

I think that’s incorrect and definitely more complicated than a simple rule. It’ll also depend on the variable types, number of columns and wether the Excel file has graphs or any ther formatting. 

New Contributor
Posts: 4

Re: Size differences of exported file form SAS

Thank your for your reply.


I understand that the file size is related to many complicate rules as you mentioned.
Since I handled very limited columns and data types (only few text and numbers) in this case, I had a question with a simple idea.

Thank you for your advice!

Solution
Thursday
Super User
Posts: 9,880

Re: Size differences of exported file form SAS

The modern Office files (since Microsoft finally saw the light and followed the example set by OpenOffice) are groups of xml files contained in a single .zip archive. XML is text with tags, so the files would initially be MUCH larger than ordinary text files, but the zip compression takes care of that very nicely.

How much compression rate you get is dependent on the contents, so there is no clear cut answer.

 

But note that text files are inherently better for any kind of data transfer. A text file is a text file is a text file, eg the csv format has not changed for decades, and will not change in the foreseeable future. You don't need a special version of some special software to read it, any text editor is sufficient. The same is demonstrably not true for Office files.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
New Contributor
Posts: 4

Re: Size differences of exported file form SAS

Posted in reply to KurtBremser

From your posting,

I figured out the difference of xlsx file with my dataset(changed to .zip and checked the generated files in xml)

There were several common data in the dataset, and those were written in sharedStrings.xml of the archive.

now I understood that what happend on files.

It answered my questions and I learned a lot.

 

I appreciate with your tips.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 109 views
  • 0 likes
  • 4 in conversation