BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

Hi Team,

When I ran a Proc contents of my dataset the following is the result:

                                                          Observations          52829
                                                            Variables             32
                                                            Indexes               0
                                                            Observation Length    592
  Deleted Observations 0
                                                              Compressed            NO
                                                              Sorted                NO


Data Set Page Size          16384
Number of Data Set Pages    1957
First Data Page             1
Max Obs per Page            27
Obs in First Data Page      19

When I exported to XML file the size was  147056KB

This is a pretty BIG FILE.

Do you have any suggestions to reduce the size of the file???

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Open in Excel and Save AS XLS or XLSX. Then delete the XML.

Open an XML file in a text editor and you'll see why it really isn't a good interchange format for large datasets (IMHO).

View solution in original post

4 REPLIES 4
ballardw
Super User

Open in Excel and Save AS XLS or XLSX. Then delete the XML.

Open an XML file in a text editor and you'll see why it really isn't a good interchange format for large datasets (IMHO).

robertrao
Quartz | Level 8

Hi ,

Thanks for the reply. I created an XML file(147056KB/115MB) and saved it as EXcel (size got reduced to 8115KB).

Is this size OK? !!

Regards

robertrao
Quartz | Level 8

Hi,

So we could decrease it by 10 times compared to XML??

Regards

TomKari
Onyx | Level 15

Hi, Karun

You can very quickly use some rules of thumb for the data volumes of your original data:

You have 32 variables, and a SAS numeric variable is 8 bytes, so if they were all numeric each observation would need 8*32, or 256 bytes. You can see that your observation length is 592 bytes, so there are some character variables in there that are longer than 8.

Using your 592 byte observation length, and your 52,829 observations, SAS needs somewhere around 31 MB to store the data (there are always overheads and inefficiencies, so these numbers are never exact). This is perfectly reasonable, and won't cause you any problems.

As you noted, when you convert it to XML the file size balloons to 147 MB. Ballardw's comment is exactly correct; XML is a very flexible medium, but the price of that is that it required enormous amounts of storage. I'm surprised that your XML dataset isn't even bigger.

Native Excel is much more efficient in storage terms, so your dataset drops again to 8MB (even smaller than in SAS). The price you pay for this efficiency is that Excel files can't be used in as many different places as an XML file.

Finally, it's very likely that the size of your SAS dataset could be reduced, maybe significantly. This would involve analyzing your character variables, and reducing the length to that of the longest value, and possibly reducing the length of your numeric variables, based on the characteristics of the contents. But it's a lot of work, and really not worth it for only 52,000 observations.

Hope this answers your questions.

Tom

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 816 views
  • 0 likes
  • 3 in conversation