Is there anything that can be done to address slow performance when querying (using proc sql) datasets created by the XML libname engine? These datasets use ordinal fields created by the XML engine, and proc sql is taking forever as in not finishing after 30 mins. with just a month's worth of data. I am left joining 4-5 tables. I know the SQL ran to completion when I tested on one day's worth of data, but now I'm running on a month's worth.
Any thoughts on performance tuning are much appreciated.
If your table isn't large PROC COPY it to WORK and PROC SQL the WORK table.
Why are you using XML files instead of native SAS datasets? The XML engine is for import/export, not for processing.
The XML file structure is on of the most inefficient ever invented, only to be used for short messages and configurations files, and similar.
Don't use it for data storage/processing unless there are no other options available to you.
Just to give you some perspective on relative performance BASE dataset vs. XML dataset:
libname x_test xml '/folders/myfolders/test.xml';
data x_test.bweight;
set sashelp.bweight;
run;
proc freq data=sashelp.bweight;
tables married;
run;
proc freq data=x_test.bweight;
tables married;
run;
libname x_test clear;
Log from that:
73 libname x_test xml '/folders/myfolders/test.xml'; NOTE: Libref X_TEST was successfully assigned as follows: Engine: XML Physical Name: /folders/myfolders/test.xml 74 75 data x_test.bweight; 76 set sashelp.bweight; 77 run; NOTE: SAS-Variablenetiketten, Einleseformate und Längen werden nicht in DBMS-Tabellen geschrieben. NOTE: There were 50000 observations read from the data set SASHELP.BWEIGHT. NOTE: The data set X_TEST.BWEIGHT has 50000 observations and 10 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 16.46 seconds cpu time 16.31 seconds 78 79 proc freq data=sashelp.bweight; 80 tables married; 81 run; NOTE: There were 50000 observations read from the data set SASHELP.BWEIGHT. NOTE: Verwendet wurde: PROZEDUR FREQ - (Gesamtverarbeitungszeit): real time 0.06 seconds cpu time 0.05 seconds 82 83 proc freq data=x_test.bweight; 84 tables married; 85 run; NOTE: There were 50000 observations read from the data set X_TEST.BWEIGHT. NOTE: Verwendet wurde: PROZEDUR FREQ - (Gesamtverarbeitungszeit): real time 6.10 seconds cpu time 5.89 seconds 86 87 libname x_test clear; NOTE: Libref X_TEST has been deassigned.
So you have a time factor of 100(!) with a very small dataset that is just 4MB (BASE) on disk. As XML, it grows by a factor of about 4 to 15 MB.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.