10-29-2012 06:08 PM
When I ran a Proc contents of my dataset the following is the result:
Observation Length 592
Deleted Observations 0
Data Set Page Size 16384
Number of Data Set Pages 1957
First Data Page 1
Max Obs per Page 27
Obs in First Data Page 19
When I exported to XML file the size was 147056KB
This is a pretty BIG FILE.
Do you have any suggestions to reduce the size of the file???
10-30-2012 10:26 AM
You can very quickly use some rules of thumb for the data volumes of your original data:
You have 32 variables, and a SAS numeric variable is 8 bytes, so if they were all numeric each observation would need 8*32, or 256 bytes. You can see that your observation length is 592 bytes, so there are some character variables in there that are longer than 8.
Using your 592 byte observation length, and your 52,829 observations, SAS needs somewhere around 31 MB to store the data (there are always overheads and inefficiencies, so these numbers are never exact). This is perfectly reasonable, and won't cause you any problems.
As you noted, when you convert it to XML the file size balloons to 147 MB. Ballardw's comment is exactly correct; XML is a very flexible medium, but the price of that is that it required enormous amounts of storage. I'm surprised that your XML dataset isn't even bigger.
Native Excel is much more efficient in storage terms, so your dataset drops again to 8MB (even smaller than in SAS). The price you pay for this efficiency is that Excel files can't be used in as many different places as an XML file.
Finally, it's very likely that the size of your SAS dataset could be reduced, maybe significantly. This would involve analyzing your character variables, and reducing the length to that of the longest value, and possibly reducing the length of your numeric variables, based on the characteristics of the contents. But it's a lot of work, and really not worth it for only 52,000 observations.
Hope this answers your questions.
Need further help from the community? Please ask a new question.