Dear All,
As part of my monthly work, 50ish SAS tables are output to the format of Excel spreadsheets, so other users are able to view them using Excel. These SAS tables all have the same data structure, eg. 10 variables (5 character, 5 numeric), with from 2k to 400k records. I used to do proc export, with 400k records taking about 20-30 seconds.
Recently end users asked to see some format (color, font, preset filter, fixed headerline etc), natually I decide to use tagsets.exclexp. To my dismay, for 400k records it took about 25-30 minutes, and file size jumped 10 times larger , from 15M to 150M. If I remove those style/format options, it of course accelerates the process, but then it beats the purpose of this proc-export to ODS transition.
Please advise if there are ways to get around this.
Thank you in advance,
Haikuo
More details: I used tagset.excelxp plus Proc print. I will post my code once the SAS finish this round, which may take another hour or 2.
Thanks,
Haikuo
Here is my log:
842 %macro hd_out;
843 ods listing close;
844 ods tagsets.excelxp
845 file='I:\temp\gap\gap_hd.xml'
846 style=xlsansprinter;
847 %do i=1 %to 3;
848 ods tagsets.excelxp
849 options (sheet_name="&&sheetname&i"
850 embedded_titles ='no'
851 autofilter = 'yes'
852 frozen_headers='1');
853
854 proc printdata=code.hd_&i;
855 run;
856
857 %end;
858 odstagsets.excelxp close;
859 %mend;
860
861
862 %hd_out;
MLOGIC(HD_OUT): Beginning execution.
MPRINT(HD_OUT): ods listing close;
MPRINT(HD_OUT): ods tagsets.excelxpfile='I:\temp\gap\gap_hd.xml' style=xlsanspr
NOTE: Writing TAGSETS.EXCELXP Body file:I:\temp\gap\gap_hd.xml
NOTE: This is the Excel XP tagset (Compatible withSAS 9.1.3 and above, v1.116, 08
options(doc='help') to the ods statement for moreinformation.
MLOGIC(HD_OUT): %DO loop beginning; indexvariable I; start value is 1; stop valu
value is 1.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 1
SYMBOLGEN: Macro variable SHEETNAME1resolves to HD Adquacy I-Monthly Measurement
Delivered HD Dose
MPRINT(HD_OUT): ods tagsets.excelxpoptions ( sheet_name="HD Adquacy I-Monthly M
of Delivered HD Dose" embedded_titles ='no'autofilter = 'yes' frozen_headers='1')
SYMBOLGEN: Macro variable I resolves to 1
MPRINT(HD_OUT): proc printdata=code.hd_1;
MPRINT(HD_OUT): run;
NOTE: There were 255724 observations read from thedata set CODE.HD_1.
NOTE: PROCEDURE PRINT used (Total process time):
realtime 26:41.15
cputime 26:40.64
MLOGIC(HD_OUT): %DO loop index variable I isnow 2; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 2
SYMBOLGEN: Macro variable SHEETNAME2resolves to HD Adquacy II-Method of Measurem
Delivered Dose
MPRINT(HD_OUT): ods tagsets.excelxpoptions ( sheet_name="HD Adquacy II-Method o
Measurement of Delivered Dose"embedded_titles ='no' autofilter = 'yes' frozen_hea
SYMBOLGEN: Macro variable I resolves to 2
MPRINT(HD_OUT): proc printdata=code.hd_2;
MPRINT(HD_OUT): run;
NOTE: There were 367535 observations read from thedata set CODE.HD_2.
NOTE: PROCEDURE PRINT used (Total process time):
realtime 38:10.41
cputime 38:09.70
MLOGIC(HD_OUT): %DO loop index variable I isnow 3; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 3
SYMBOLGEN: Macro variable SHEETNAME3 resolvesto HD Adquacy III-Minimum Delivered
Hemodialysis Dose
MPRINT(HD_OUT): ods tagsets.excelxpoptions ( sheet_name="HD Adquacy III-Minimum
Hemodialysis Dose" embedded_titles ='no'autofilter = 'yes' frozen_headers='1');
SYMBOLGEN: Macro variable I resolves to 3
MPRINT(HD_OUT): proc printdata=code.hd_3;
MPRINT(HD_OUT): run;
NOTE: There were 30981 observations read from thedata set CODE.HD_3.
NOTE: PROCEDURE PRINT used (Total process time):
realtime 1:37.24
cputime 1:37.15
MLOGIC(HD_OUT): %DO loop index variable I isnow 4; loop will not iterate again.
MPRINT(HD_OUT): ods tagsets.excelxpclose;
MLOGIC(HD_OUT): Ending execution.
My test run only involved 3 tables, and it already takes me more than 1 hour. I am new to tagsets.excelxp syntax, so it is only normal I may have something in my code to drag slow the whole process. However, this time, I ended up with a xml file larger than 1Gb, and it takes excel more than 5 minutes just to load it. All this actually makes me think that tagset.excelxp is designed only for 'reporting' purpose instead of data outputting like this.
Thank you all and any of your comments and suggestions will be much appreciated!
Haikuo
Hi:
You might want to work this with Tech Support to see how you can improve performance.
But I do have an observation on your file size comment: A proprietary .XLS file will probably always be smaller than an XML file. TAGSETS.EXCELXP creates Microsoft "flavor" of Spreadsheet Markup Language XML 2003. An XML file is an ASCII text file -- it is likely to be quite large, but should compress nicely with a zip utility. If you want colors and fonts and cosmetic formatting, then you have to use ODS techniques to get the files for Excel to open. But with ODS methods, you are NOT creating "true, binary" Excel files -- you are merely creating ASCII text files that Excel knows how to open and render.
cynthia
And adding to Cynthias response, many people generate "fake" spreadsheets by generating HTML output but naming the files with the XLS extension. I'm not sure how good the colors and other features you need are preserved though. You may still have some size issues which are partially solved by opening and saving back to real XLS or XLSX.
Yep, that is probably why a warning window popping up when I channel tagsets.excelxp to .xls. See my response to Cynthia's post.
Thanks,
Haikuo
Cynthia,
Thank you very much. That was really helpful in term of file size. It DOES zipped nicely! After zipping, my 1.2GB file shrank to 11M, it is like 1% compress ratio. I think I will go with zipping method when transferring data.
I have tried .xls, it is smaller, but not significant enough to make an impact, also, it trigers a pop-up warning window when using Excel to open it, and believe it or not, some end users will be freaking out upon seeing that.
I haven't tried them out yet, but I will ask them in general sense:
Performance wise, given ODS must be used, how do you feel if I use other output methods other than proc print? Such as proc report or data _null_?
And I will contact tech support after I have exhausted my end of possibilities.
Thanks again,
Haikuo
What is making the file large is the amount of markup language text that needs to be written. I don't see that amount of text changing if you use PROC PRINT or DATA _NULL_. I would guess that the time to compile your macro is negligible. Really, if there are any performance gains to be made, Tech Support are the folks to help you figure that out. That's my .02 on the subject.
cynthia
I have to add my two cents as well. Yes, zipping can reduce the file size, but that doesn't change the facts that it took so much longer to run, that everyone now has to spend the time unzipping the files and, worse, some email systems block files that have a zip extension.
I think that it is definitely time that the proc export developers include some of the capabilities that have attracted everyone to the tagsets. It would be nice, in my opinion, to be able to use proc export to export formatted spreadsheets, html output, pdfs, etc., without having to include the extra complexity of xml.
I was wondering if an empty workbook could be created that had some formatting applied "somehow". If the EXCEL LIBNAME engine could be used to fill the workbook with data with the formatting applied because it is in the "template" workbook.
This is not an answer to the too slow problem but perhaps a way around it.
DataNull: Do I see a concept for a paper for next year's SGF starting here?
Not to answer Art, but to 'append' to the last.
Thank you all, Cynthia, Art and Null.
OK, I have got some walk-around suggestions to reduce my pain to somewhat manageable level. my original problem can be broke into 2 parts: 1. Too slow 2. Too big. Of course they are related, big file takes long time to write.
So the first suggestion I have got is targeting both, doing so, I have to ditch ODS. In brief, it is like this:
Proc Export -- DDE + VBA macro for fancy formats.
The second suggestion is lot simpler, but only targeting 'too big'. Flow will be like:
ODS excelxp xml file -- open it and 'save as' xlsx file. It cuts down my 1.2G to 50M.
Both approaches are doable using SAS (or call within SAS) , with no manual effort involved once setting up.
I 'd appreciate your comments and suggestions!
Haikuo
I typically use a combination of what Data _null_ has suggested, I have a workbook that has preformatted cells and then link them to the another sheet that I fill in with a proc export or something.
I've also found that it does require a bit of manual 'labour' though it could be automated with DDE.
The issue is if you create or overwrite a worksheet in Excel I find all the links tend to break. It might be okay if you do it to a range rather than a full sheet but I haven't tried that. I could also be doing something wrong.
I'd be concerned with the speed of DDE as it requires opening excel.
One thought,In Excel you can create a data source connection that links to a CSV file. If you've formatted the fields and then update the CSV file and reload the formats should hold. It really depends on how you plan to distribute the reports though, if you need the olds to be maintained and users need access to them etc.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.