BookmarkSubscribeRSS Feed
Haikuo
Onyx | Level 15

Dear All,

As part of my monthly work,  50ish SAS tables are output to the format of Excel spreadsheets, so other users are able to view them using Excel. These SAS tables all have the same data structure, eg. 10 variables (5 character, 5 numeric), with from 2k to 400k records. I used to do proc export, with 400k records taking about 20-30 seconds.

Recently end users asked to see some format (color, font, preset filter, fixed headerline etc),  natually  I decide to use tagsets.exclexp. To my dismay, for 400k records it took about 25-30 minutes, and file size jumped 10 times larger , from 15M to 150M. If I remove those style/format options, it of course accelerates the process, but then it beats the purpose of this proc-export to ODS transition.

Please advise if there are ways to get around this.

Thank you in advance,

Haikuo

12 REPLIES 12
Haikuo
Onyx | Level 15

More details: I used tagset.excelxp plus Proc print. I will post my code once the SAS finish this round, which may take another hour or 2.

Thanks,

Haikuo

Haikuo
Onyx | Level 15

Here is my log:

842     %macro hd_out;

843  ods listing close;

844  ods tagsets.excelxp

845     file='I:\temp\gap\gap_hd.xml'

846     style=xlsansprinter;

847  %do i=1 %to 3;

848  ods tagsets.excelxp

849      options (sheet_name="&&sheetname&i"

850               embedded_titles ='no'

851               autofilter = 'yes'

852               frozen_headers='1');

853

854      proc printdata=code.hd_&i;

855      run;

856

857  %end;

858      odstagsets.excelxp close;

859  %mend;

860

861

862  %hd_out;

MLOGIC(HD_OUT):  Beginning execution.

MPRINT(HD_OUT):   ods listing close;

MPRINT(HD_OUT):   ods tagsets.excelxpfile='I:\temp\gap\gap_hd.xml' style=xlsanspr

NOTE: Writing TAGSETS.EXCELXP Body file:I:\temp\gap\gap_hd.xml

NOTE: This is the Excel XP tagset (Compatible withSAS 9.1.3 and above, v1.116, 08

options(doc='help') to the ods statement for moreinformation.

MLOGIC(HD_OUT):  %DO loop beginning; indexvariable I; start value is 1; stop valu

      value is 1.

SYMBOLGEN:  && resolves to &.

SYMBOLGEN:  Macro variable I resolves to 1

SYMBOLGEN:  Macro variable SHEETNAME1resolves to HD Adquacy I-Monthly Measurement

           Delivered HD Dose

MPRINT(HD_OUT):   ods tagsets.excelxpoptions ( sheet_name="HD Adquacy I-Monthly M

of Delivered HD Dose" embedded_titles ='no'autofilter = 'yes' frozen_headers='1')

SYMBOLGEN:  Macro variable I resolves to 1

MPRINT(HD_OUT):   proc printdata=code.hd_1;

MPRINT(HD_OUT):   run;

NOTE: There were 255724 observations read from thedata set CODE.HD_1.

NOTE: PROCEDURE PRINT used (Total process time):

      realtime           26:41.15

      cputime            26:40.64

MLOGIC(HD_OUT):  %DO loop index variable I isnow 2; loop will iterate again.

SYMBOLGEN:  && resolves to &.

SYMBOLGEN:  Macro variable I resolves to 2

SYMBOLGEN:  Macro variable SHEETNAME2resolves to HD Adquacy II-Method of Measurem

           Delivered Dose

MPRINT(HD_OUT):   ods tagsets.excelxpoptions ( sheet_name="HD Adquacy II-Method o

Measurement of Delivered Dose"embedded_titles ='no' autofilter = 'yes' frozen_hea

SYMBOLGEN:  Macro variable I resolves to 2

MPRINT(HD_OUT):   proc printdata=code.hd_2;

MPRINT(HD_OUT):   run;

NOTE: There were 367535 observations read from thedata set CODE.HD_2.

NOTE: PROCEDURE PRINT used (Total process time):

      realtime           38:10.41

      cputime            38:09.70

MLOGIC(HD_OUT):  %DO loop index variable I isnow 3; loop will iterate again.

SYMBOLGEN:  && resolves to &.

SYMBOLGEN:  Macro variable I resolves to 3

SYMBOLGEN:  Macro variable SHEETNAME3 resolvesto HD Adquacy III-Minimum Delivered

           Hemodialysis Dose

MPRINT(HD_OUT):   ods tagsets.excelxpoptions ( sheet_name="HD Adquacy III-Minimum

Hemodialysis Dose" embedded_titles ='no'autofilter = 'yes' frozen_headers='1');

SYMBOLGEN:  Macro variable I resolves to 3

MPRINT(HD_OUT):   proc printdata=code.hd_3;

MPRINT(HD_OUT):   run;

NOTE: There were 30981 observations read from thedata set CODE.HD_3.

NOTE: PROCEDURE PRINT used (Total process time):

      realtime           1:37.24

      cputime            1:37.15

MLOGIC(HD_OUT):  %DO loop index variable I isnow 4; loop will not iterate again.

MPRINT(HD_OUT):   ods tagsets.excelxpclose;

MLOGIC(HD_OUT):  Ending execution.

My test run only involved 3 tables, and it already takes me more than 1 hour. I am new to tagsets.excelxp syntax, so it is only normal I may have something in my code to drag slow the whole process. However, this time, I ended up with a xml file larger than 1Gb, and it takes excel more than 5 minutes just to load it. All this actually makes me think that tagset.excelxp is designed only for 'reporting' purpose instead of data outputting like this.

Thank you all and any of your comments and suggestions will be much appreciated!

Haikuo

Cynthia_sas
SAS Super FREQ

Hi:

  You might want to work this with Tech Support to see how you can improve performance.

  But I do have an observation on your file size comment: A proprietary .XLS file will probably always be smaller than an XML file. TAGSETS.EXCELXP creates Microsoft "flavor" of Spreadsheet Markup Language XML 2003. An XML file is an ASCII text file -- it is likely to be quite large, but should compress nicely with a zip utility. If you want colors and fonts and cosmetic formatting, then you have to use ODS techniques to get the files for Excel to open. But with ODS methods, you are NOT creating "true, binary" Excel files -- you are merely creating ASCII text files that Excel knows how to open and render.

cynthia

ballardw
Super User

And adding to Cynthias response, many people generate "fake" spreadsheets by generating HTML output but naming the files with the XLS extension. I'm not sure how good the colors and other features you need are preserved though. You may still have some size issues which are partially solved by opening and saving back to real XLS or XLSX.

Haikuo
Onyx | Level 15

Yep, that is probably why a warning window popping up when I channel tagsets.excelxp to .xls. See my response to Cynthia's post.

Thanks,

Haikuo

Haikuo
Onyx | Level 15

Cynthia,

Thank you very much. That was really helpful in term of file size. It DOES zipped nicely! After zipping, my 1.2GB file shrank to 11M, it is like 1% compress ratio. I think I will go with zipping method when transferring data.

I have tried .xls, it is smaller, but not significant enough to make an impact, also, it trigers a pop-up warning window when using Excel to open it, and believe it or not, some end users will be freaking out upon seeing that.

I haven't tried them out yet, but I will ask them in general sense:

Performance wise, given ODS must be used,  how do you feel if I use other output methods other than proc print? Such as proc report or data _null_?

And I will contact tech support after I have exhausted my end of possibilities.

Thanks again,

Haikuo

Cynthia_sas
SAS Super FREQ

What is making the file large is the amount of markup language text that needs to be written. I don't see that amount of text changing if you use PROC PRINT or DATA _NULL_. I would guess that the time to compile your macro is negligible. Really, if there are any performance gains to be made, Tech Support are the folks to help you figure that out. That's my .02 on the subject.

cynthia

art297
Opal | Level 21

I have to add my two cents as well.  Yes, zipping can reduce the file size, but that doesn't change the facts that it took so much longer to run, that everyone now has to spend the time unzipping the files and, worse, some email systems block files that have a zip extension.

I think that it is definitely time that the proc export developers include some of the capabilities that have attracted everyone to the tagsets.  It would be nice, in my opinion, to be able to use proc export to export formatted spreadsheets, html output, pdfs, etc., without having to include the extra complexity of xml.

data_null__
Jade | Level 19

I was wondering if an empty workbook could be created that had some formatting applied "somehow".  If the EXCEL LIBNAME engine could be used to fill the workbook with data with the formatting applied because it is in the "template" workbook.

This is not an answer to the too slow problem but perhaps a way around it.

art297
Opal | Level 21

DataNull: Do I see a concept for a paper for next year's SGF starting here?

Haikuo
Onyx | Level 15

Not to answer Art, but to 'append' to the last.

Thank you all, Cynthia, Art and Null.

OK, I have got some walk-around suggestions to reduce my pain to somewhat manageable level. my original problem can be broke into 2 parts: 1. Too slow 2. Too big.  Of course they are related, big file takes long time to write.

So the first suggestion I have got is targeting both, doing so, I have to ditch ODS. In brief, it is like this:

Proc Export -- DDE + VBA macro for fancy formats.

The second suggestion is lot simpler, but only targeting 'too big'. Flow will be like:

ODS excelxp xml file -- open it and 'save as' xlsx file. It cuts down my 1.2G to 50M.

Both approaches are doable using SAS (or call within SAS) , with no manual effort involved once setting up.

I 'd appreciate your  comments and suggestions!

Haikuo

Reeza
Super User

I typically use a combination of what Data _null_ has suggested, I have a workbook that has preformatted cells and then link them to the another sheet that I fill in with a proc export or something.

I've also found that it does require a bit of manual 'labour' though it could be automated with DDE.

The issue is if you create or overwrite a worksheet in Excel I find all the links tend to break. It might be okay if you do it to a range rather than a full sheet but I haven't tried that. I could also be doing something wrong.

I'd be concerned with the speed of DDE as it requires opening excel.

One thought,In Excel you can create a data source connection that links to a CSV file. If you've formatted the fields and then update the CSV file and reload the formats should hold. It really depends on how you plan to distribute the reports though, if you need the olds to be maintained and users need access to them etc.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 3384 views
  • 4 likes
  • 6 in conversation