BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JayeL
Calcite | Level 5

Hi, All

 

I made two different file types (xlsx and txt : csv) by using the option, export from a SAS project.


I believe that the size of a text file is smaller than excel file only except large dataset over a million rows (becuase excel file can store only 2x2^20 - around a million rows).


However, when I export data from SAS with these two types of file by using a same data, I found that some dataset saved as excel(.xlsx) is much smaller than text type.


Is there any automatically applied compression process when the dataset is exported as xlsx from SAS?


I tried that matching up data types (numerical and char fields) of columns and other possible options, but I still do not get a clear explaination on that.


If you have any suspicous part and possible ideas on this, please drop a hint. 🙂

I really appreciate any help from you all. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The modern Office files (since Microsoft finally saw the light and followed the example set by OpenOffice) are groups of xml files contained in a single .zip archive. XML is text with tags, so the files would initially be MUCH larger than ordinary text files, but the zip compression takes care of that very nicely.

How much compression rate you get is dependent on the contents, so there is no clear cut answer.

 

But note that text files are inherently better for any kind of data transfer. A text file is a text file is a text file, eg the csv format has not changed for decades, and will not change in the foreseeable future. You don't need a special version of some special software to read it, any text editor is sufficient. The same is demonstrably not true for Office files.

View solution in original post

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

xlsx files are actually zip files.

 

Rename your xlsx file to a zip extension and open it with your decompressor of choice to verify this fact.

JayeL
Calcite | Level 5

Thank you for your tip. I followed that your posting and I figured out the anatomy of excel files.

Reeza
Super User


I believe that the size of a text file is smaller than excel file only except large dataset over a million rows (becuase excel file can store only 2x2^20 - around a million rows).

 

I think that’s incorrect and definitely more complicated than a simple rule. It’ll also depend on the variable types, number of columns and wether the Excel file has graphs or any ther formatting. 

JayeL
Calcite | Level 5

Thank your for your reply.


I understand that the file size is related to many complicate rules as you mentioned.
Since I handled very limited columns and data types (only few text and numbers) in this case, I had a question with a simple idea.

Thank you for your advice!

Kurt_Bremser
Super User

The modern Office files (since Microsoft finally saw the light and followed the example set by OpenOffice) are groups of xml files contained in a single .zip archive. XML is text with tags, so the files would initially be MUCH larger than ordinary text files, but the zip compression takes care of that very nicely.

How much compression rate you get is dependent on the contents, so there is no clear cut answer.

 

But note that text files are inherently better for any kind of data transfer. A text file is a text file is a text file, eg the csv format has not changed for decades, and will not change in the foreseeable future. You don't need a special version of some special software to read it, any text editor is sufficient. The same is demonstrably not true for Office files.

JayeL
Calcite | Level 5

From your posting,

I figured out the difference of xlsx file with my dataset(changed to .zip and checked the generated files in xml)

There were several common data in the dataset, and those were written in sharedStrings.xml of the archive.

now I understood that what happend on files.

It answered my questions and I learned a lot.

 

I appreciate with your tips.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1658 views
  • 0 likes
  • 4 in conversation