I'm using PROC REPORT with ODS with Tagset ExcelXP. My data generates one report page for every record. The XML generated is fairly verbose. My UNIX (AIX actually) based SAS environment keeps running out of memory (according to the message I'm getting in the log).
ERROR: The SAS System stopped processing this step because of insufficient memory.
I can only run about 500 records through at a time or I'll bomb out on memory. I guess the XML as it is being generated is written to some in-core memory location? I am using SAS EG 7.1 to submit my SAS code to UNIX.
At 600 records, I get the following if it's any help in diagnosing:
ERROR: There is not enough memory to perform class method OM_NEW(3) of "COLLECTION.LISTITERATOR".
I'm suspicious that this may be an issue of communication between SAS EG and UNIX. Has anyone experienced anything like this before? Is there a way I can *not* have my ODS output go back to SAS EG? Presumably that's what's causing the memory problem.
Jim
I assume when you say page you really mean SHEET.
Why don't you try asking for more memory in your UNIX batch job? -memsize '4g' perhaps. Look a the documentation for the correct syntax and option.
What version of SAS are you using?
XML is verbose, but other options depend on version
I'm using SAS 9.3 as follows:
NOTE: Copyright (c) 2002-2010 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.3 (TS1M2)
NOTE: This session is executing on the AIX 7.1 (AIX 64) platform.
This does not appear to be related to Enterprise Guide. I still get insufficient memory messages when submitting directly on UNIX/AIX. The message is a little different this time:
NOTE: Insufficient memory to allocate parsing stack. Program will be terminated.
ERROR: The SAS System stopped processing this step because of insufficient memory.
I hadn't seen the bit about the parsing stack before.
I notice that as my SAS program runs, the log file gets progressively bigger until the program completes but the xml file stays at size 0 and (apparently) doesn't get written out until at or near program completion. Presumably the xml is getting written to some location in memory as PROC REPORT executes and is only written out to final text file when ODS _ALL_ CLOSE is reached. I don't suppose there's any way to get ODS to progressively/periodically dump the xml to the text file on UNIX is there?
I suppose I can write a macro to only feed X number of records at a time to PROC REPORT, save the xml in a lot of different little files, and then write a DATA step to parse the xml and stitch everything back together again. I've done xml parsing and stitching before, but yuck. 😞
I think there are too many possicle issues to be able to help you.
Tech support should be able to.
Ome possible option is to customize the tagset amd strip it down to only what you need. The XML files are large and when saved as native Excel files the size shrinks.
The xml is definitely verbose. The xml from one recent run was 30 Mb, but when I saved the same as xlsx, it was only 2 Mb.
The size of the file is mostly governed by the number of records. The header information isn't particularly large. Here's a typical row:
<Row ss:AutoFitHeight="1"> <Cell ss:StyleID="data_4" ss:Index="1"><Data ss:Type="String">Month 09:</Data></Cell> <Cell ss:StyleID="data" ss:Index="2"><Data ss:Type="Number">-16</Data></Cell> <Cell ss:StyleID="data" ss:Index="3"><Data ss:Type="Number">-16</Data></Cell> <Cell ss:StyleID="data" ss:Index="4"><Data ss:Type="Number">-16</Data></Cell> <Cell ss:StyleID="data" ss:Index="5"><Data ss:Type="Number">-16</Data></Cell> <Cell ss:StyleID="data" ss:Index="6"><Data ss:Type="String">/ /</Data></Cell> <Cell ss:StyleID="data" ss:Index="7"><Data ss:Type="String"></Data></Cell> <Cell ss:StyleID="data_5" ss:Index="8"><Data ss:Type="String"></Data></Cell> </Row>
I'm not seeing a whole lot of "fluff" that I can safely strip out of there.
The reason your finding tagset.excexp output far larger than XLSX is two fold.
Firstly the file format itself - tagsets produces one XML file with all the information required in it, including styles, data and such like. XLSX on the other hand is a Zipped folder structure with multiple XML files within that folder strcuture. Hence firstly the file is Zipped (which is why the Open part is amusing), so compressed, and secondly the individual XML files are relational - i.e. no data item is stored twice a bit like a database.
Now that being said, excelxp is a nice way to get reports - i.e. things people look at. It is not a data transfer solution. Thus if you start looking at larger amounts of data to send to other parties you should invest some time looking into proper data transfer file formats:
CSV - this is plain text, delimted (can be opened and read by Excel's native CSV parser too). It is very small size as contains no other information on the data other than header - to note though as with any data transfer you will design a data transfer document which would detail elements such as file format, structure etc.
XML - This would be slightly larger than CSV, but contains more information about the data. Can be you own structure XML.
Dataset (or XPT) - these would be the smallest file size as they are binary. They are the least portable and open at the same time. Generally speaking I tend to avoid using proprietary file as much as possible - take an example of the catlogs from 32 bit not being usable on 64bit as a prime example of why using proprietary file formats is bad.
Now to also note, depending on your SAS version, you do also have other ways to get data to Excel if you really need to. Libname Excel allows you to copy data into pure XLSX format. Proc export can create binary XLS files. Or you could build the XLSX yourself - have done it myself and its a pain, but possible.
I would start, as always, with the planning section (that 99% of the work on any software development):
What do I need this data for.
What is the best medium for that data
What are the size constraints
What are the portability constraints
What are the recipients conditions
etc.
Then make a decision based on that information about what to use.
@RW9, thank you. Good thoughts all.
For this particular data, the output is meant to be human readable and is meant to be compatible with a standard 8 1/2 X 11 sheet of paper for printing if need be. Csv would certainly be a far more compact transport mechanism but then the recipient has to do a lot of formatting each and every time a report is e-mailed out. With the xml, I can have 95+ percent of the formatting done programmatically, and the recipient need only adjust the margins and optimize the column widths.
We have a situation here that might provide some help to you.
Our requirement was to provide their individual sales volume data to external partners of our sales force (insurance).
We create a set (a large set, ~10000) of HTML pages containing their reports, and copy those out to a server in the DMZ.
Creating the web pages happens in a single data _null_ step that eats next to no memory at all and is blindingly fast. All formatting is done through SAS data step code and writing the proper HTML tags.
Takes quite a lot of programming, but the result justifies the effort.
Tabular data on web pages can be read or copy/pasted directly into spreadsheet programs, BTW.
Thank you, @Kurt_Bremser
I've used that technique (writing html in a DATA step) for other jobs. It is very very fast. It's also a lot of work. I certainly could do that, but for xml, in this case. I could run the ODS ExcelXP code to generate the header xml, and save the resultant text in a file.
I would then use a DATA step to write out all the individual rows. The first portion of the DATA step code would simply read in the header xml from the file and write that xml out to the new xml file before it starts writting out all the individual row xml. I would have almost no limit to the number of rows I could process at least in terms of memory. Disk space would be used, but that's generally more available than memory.
I assume when you say page you really mean SHEET.
Why don't you try asking for more memory in your UNIX batch job? -memsize '4g' perhaps. Look a the documentation for the correct syntax and option.
@data_null__, I was hoping to find a way to make my SAS program use less memory or at least use it more efficiently, but allocating more memory does work. I just hope I don't get fired for eating all the memory every time I run this. 🙂
The syntax for UNIX is:
nohup sas -noterminal -memsize 2G Frozen_Profile.sas
The operative portion for the memory allocation is "-memsize 2G".
For SAS Enterprise Guide users, one would edit sasv9.cfg in one's home directory and add the following line:
-memsize 2G
Enterprise Guide users may also want to refer to KB 40480: http://support.sas.com/kb/40/480.html
Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.