BookmarkSubscribeRSS Feed
shl007
Obsidian | Level 7

Is there anything that can be done to address slow performance when querying (using proc sql) datasets created by the XML libname engine? These datasets use ordinal fields created by the XML engine, and proc sql is taking forever as in not finishing after 30 mins. with just a month's worth of data. I am left joining 4-5 tables. I know the SQL ran to completion when I tested on one day's worth of data, but now I'm running on a month's worth.

Any thoughts on performance tuning are much appreciated.

5 REPLIES 5
Reeza
Super User
What size are the original XML files? Are you trying to process 2GB txt files or 4MB files?
JerryV
SAS Employee

If your table isn't large PROC COPY it to WORK and PROC SQL the WORK table.  

LinusH
Tourmaline | Level 20

The XML file structure is on of the most inefficient ever invented, only to be used for short messages and configurations files, and similar.

Don't use it for data storage/processing unless there are no other options available to you.

Data never sleeps
Kurt_Bremser
Super User

Just to give you some perspective on relative performance BASE dataset vs. XML dataset:

libname x_test xml '/folders/myfolders/test.xml';

data x_test.bweight;
set sashelp.bweight;
run;

proc freq data=sashelp.bweight;
tables married;
run;

proc freq data=x_test.bweight;
tables married;
run;

libname x_test clear;

Log from that:

 73         libname x_test xml '/folders/myfolders/test.xml';
 NOTE: Libref X_TEST was successfully assigned as follows: 
       Engine:        XML 
       Physical Name: /folders/myfolders/test.xml
 74         
 75         data x_test.bweight;
 76         set sashelp.bweight;
 77         run;
 
 NOTE: SAS-Variablenetiketten, Einleseformate und Längen werden nicht in DBMS-Tabellen geschrieben.
 NOTE: There were 50000 observations read from the data set SASHELP.BWEIGHT.
 NOTE: The data set X_TEST.BWEIGHT has 50000 observations and 10 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           16.46 seconds
       cpu time            16.31 seconds
       
 
 78         
 79         proc freq data=sashelp.bweight;
 80         tables married;
 81         run;
 
 NOTE: There were 50000 observations read from the data set SASHELP.BWEIGHT.
 NOTE:  Verwendet wurde: PROZEDUR FREQ - (Gesamtverarbeitungszeit):
       real time           0.06 seconds
       cpu time            0.05 seconds
       
 
 82         
 83         proc freq data=x_test.bweight;
 84         tables married;
 85         run;
 
 NOTE: There were 50000 observations read from the data set X_TEST.BWEIGHT.
 NOTE:  Verwendet wurde: PROZEDUR FREQ - (Gesamtverarbeitungszeit):
       real time           6.10 seconds
       cpu time            5.89 seconds
       
 
 86         
 87         libname x_test clear;
 NOTE: Libref X_TEST has been deassigned.

So you have a time factor of 100(!) with a very small dataset that is just 4MB (BASE) on disk. As XML, it grows by a factor of about 4 to 15 MB.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 630 views
  • 5 likes
  • 5 in conversation