DATA Step, Macro, Functions and more

Proc Export

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,041
Accepted Solution

Proc Export

Hi Team,

When I ran a Proc contents of my dataset the following is the result:

                                                          Observations          52829
                                                            Variables             32
                                                            Indexes               0
                                                            Observation Length    592
  Deleted Observations 0
                                                              Compressed            NO
                                                              Sorted                NO


Data Set Page Size          16384
Number of Data Set Pages    1957
First Data Page             1
Max Obs per Page            27
Obs in First Data Page      19

When I exported to XML file the size was  147056KB

This is a pretty BIG FILE.

Do you have any suggestions to reduce the size of the file???

Thanks


Accepted Solutions
Solution
‎10-29-2012 06:10 PM
Super User
Posts: 11,343

Re: Proc Export

Posted in reply to robertrao

Open in Excel and Save AS XLS or XLSX. Then delete the XML.

Open an XML file in a text editor and you'll see why it really isn't a good interchange format for large datasets (IMHO).

View solution in original post


All Replies
Solution
‎10-29-2012 06:10 PM
Super User
Posts: 11,343

Re: Proc Export

Posted in reply to robertrao

Open in Excel and Save AS XLS or XLSX. Then delete the XML.

Open an XML file in a text editor and you'll see why it really isn't a good interchange format for large datasets (IMHO).

Super Contributor
Posts: 1,041

Re: Proc Export

Hi ,

Thanks for the reply. I created an XML file(147056KB/115MB) and saved it as EXcel (size got reduced to 8115KB).

Is this size OK? !!

Regards

Super Contributor
Posts: 1,041

Re: Proc Export

Hi,

So we could decrease it by 10 times compared to XML??

Regards

PROC Star
Posts: 1,167

Re: Proc Export

Posted in reply to robertrao

Hi, Karun

You can very quickly use some rules of thumb for the data volumes of your original data:

You have 32 variables, and a SAS numeric variable is 8 bytes, so if they were all numeric each observation would need 8*32, or 256 bytes. You can see that your observation length is 592 bytes, so there are some character variables in there that are longer than 8.

Using your 592 byte observation length, and your 52,829 observations, SAS needs somewhere around 31 MB to store the data (there are always overheads and inefficiencies, so these numbers are never exact). This is perfectly reasonable, and won't cause you any problems.

As you noted, when you convert it to XML the file size balloons to 147 MB. Ballardw's comment is exactly correct; XML is a very flexible medium, but the price of that is that it required enormous amounts of storage. I'm surprised that your XML dataset isn't even bigger.

Native Excel is much more efficient in storage terms, so your dataset drops again to 8MB (even smaller than in SAS). The price you pay for this efficiency is that Excel files can't be used in as many different places as an XML file.

Finally, it's very likely that the size of your SAS dataset could be reduced, maybe significantly. This would involve analyzing your character variables, and reducing the length to that of the longest value, and possibly reducing the length of your numeric variables, based on the characteristics of the contents. But it's a lot of work, and really not worth it for only 52,000 observations.

Hope this answers your questions.

Tom

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 235 views
  • 0 likes
  • 3 in conversation