11-21-2012 08:21 AM
I'm wondering if there is a better way to generate multiple pdf files more efficiently and faster as in terms of processing time. I'm consulting for a pharma company and currently I have an automated pdf generation in place that spits out a pdf file in about 8 seconds, but I think the timing can be decreased. The problem I face with is when pdf's are being generated for 1000 or more patients. Each pdf generation relies on aggregating data across 5 or more different datasets (or domains) - at-times using merge or the (domain) data itself. The data that is being used for the pdf is not that big at all. Since it takes 8 seconds to render one single pdf file, for 1000 it takes about 8000 secs or a little over 2 hrs.... This is becoming a problem, since the number of patients differs depending upon the study being analysed. I made sure to optimize the sorting procedures implementing them only when needed instead of initiating multiple times. I also tried limiting the I/O with respect to dataset generation. But these approach didn't seem to help that much.
A macro defines the layout of the pdf and the various settings for that pdf and the macro accepts parameters that defines the data of each pdf. However, the call to this macro is done from within a data _null_ step which uses the master subject table, sequentially analyzing each subject by applying certain criteria and if that criteria is met a call execute step launches the pdf generation for that subject.
I'm wondering if the performance is affected by calling the pdf generation macro from within a data _null_ statement as supposed to making the call individually without making it a data step dependent call.
Any help is greatly appreciated.
11-21-2012 08:49 AM
I'm not certain from your description but it sounds like you do everything for each subject in a loop (macro I reckon). I would take all the data manipulations out of the loop and do it all for all subjects at once (by subject). This way each of those merges or other visit to the various domains is only done once albeit for all subjects. Then you end up with one file of summary data for all subjects of interest and you are left with just the PDF part.
I'm not certain but you may be able to get ODS PDF to create new files based in the value of a BY variable (subject) that would allow that part of the process to be reduced to one step with no macro looping.
11-21-2012 09:05 AM
Thanks a million for your response.
There is a macro to process the pdf based upon the parameters it is being passed, one of them being the subjectid. All this macro does is use proc report procedure to print out the data to the pdf. The template structure is defined outside of this macro.
Then a data step does the following:
set masterSubjs; * Depending upon the data this dataset can have anywhere between 200 - 3000 subjects - which requires to generate the equal number of pdf files.
<.... conditional checks ..if..else>
I was wondering since the call to the macro is from a data step rather than a straight call %printPdf(<parameters>) effects the processing time.
11-21-2012 09:30 AM
I would change my macro calls from call execute to includ %NRSTR(
This will delay execution of the macros until all the calls are generated. I doubt that will make it faster but in most cases using NRSTR to delay execution of the macro is desired with CALL EXECUTE.
Is there any processing that is done in %PRINTPDF that could be done outside that macro.
11-21-2012 09:36 AM
Oh! thanks for the tip. I could try that. The only processing that's done in the pdf macro is based on the parameters - other than that - no data dependent processing is done...
I was also wondering if setting the system options like telling sas to spilt the work sessions and setting the compress option to yes will help.
11-21-2012 06:09 PM
I doubt that using the COMPRESS option will make a difference. If all you're doing is sending output to PDF, the PDF COMPRESS= option is currently set to level 6 by default. If you wanted to make less compressed PDF files, then you could use COMPRESS=1 in your ODS PDF invocation:
ods pdf file='xxx.pdf' compress=1;
But before you try to split the work sessions to improve performance, I'd recommend opening a track with Tech Support to see whether they have any suggestions, including that one. This is the kind of question where someone needs to look at all your code and your system settings and your configuration to help you zero in on the best performance tweaks.