BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Satish_Parida
Lapis Lazuli | Level 10

Hi Experts,

 

I am having little trouble tuning a import process from XML to SAS.

I am using Linux System with SAS 9.4 Installed. I am using XML mapper 9.43 for interpretation of the data.

 

For all XML tags we create tables and then sql join them using system generated ordinal columns to get the final table.

 

I am using proc copy to import all the XML tags to SAS tables.

 

The problem is in PROC COPY, as it is slow. In the log I can see following.

 

INFO: Data set block I/O cannot be used because:
INFO: - The data sets use different engines, have different variables or have attributes that may differ.

So Basically there is no use in increasing buffer size or bufno.

 

Could you please suggest any other practice that can boost the performance.

 

Note: I tried COMPRESS=NO that did not help.

 

Regards

Satish

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Satish_Parida

O.K. So what you're doing is basically:

1. You've got two libnames, one for your source and one for your target library.

2. Your source library points to a folder containing XML files (and you're using the XMLV2 engine)

3. You use Proc Copy to copy your data from source to target.

 

What's most likely the bottleneck in this process is not the copy process as such but the parsing of the source XML.

 

Your current process reads, parses and copies one XML after the other (single threaded). If you want to speed up the process then change your code. 

1. Create a list of all XML files in the folder

2. Point the source libname to a specific XML instead of the folder containing the XML (dynamic based on the list created)

3. Implement parallel processing best via rsubmit on the same machine or if you don't have SAS/Connect licensed then via %SYSEXEC()

4. Run the child processes in parallel (1 per XML to be converted to a SAS table)

View solution in original post

7 REPLIES 7
Reeza
Super User

What do you mean by copying XML? XML are text files really so a copy can be really easy if all you want is another physical copy, via FCOPY() function. 

 

Or do you mean import an XML file instead, ie read/parse an XML file to a SAS data set?


@Satish_Parida wrote:

Hi Experts,

 

I am having little trouble tuning a copy process from XML to SAS.

I am using Linux System with SAS 9.4 Installed. I am using XML mapper 9.43 for interpretation of the data.

 

However in the log I can see following.

 

INFO: Data set block I/O cannot be used because:
INFO: - The data sets use different engines, have different variables or have attributes that may differ.

So Basically there is no use in increasing buffer size or bufno.

 

Could you please suggest any other practice that can boost the performance.

 

Note: I tried COMPRESS=NO that did not help.

 

Regards

Satish


 

Satish_Parida
Lapis Lazuli | Level 10

I mean importing XML data using a mapper.(Question Edited)
I am using PROC COPY now, which should be efficient for this work, but as BLOCK IO is not in work it is slow.

Tom
Super User Tom
Super User

We probably need a lot more detail about the file to provide any specific advice.

In general an XML file is not an efficient storage format, since it is normally very verbose. If you want speed do not use XML.

 

You might be able to find a non-SAS utility that could convert the XML into a more usable format, like a simple CSV file, that SAS can read more efficiently.

 

Satish_Parida
Lapis Lazuli | Level 10
1. I can not change the input source from XML to other.
2. I can not change the existing architecture, my job is to tune the process.
I am reading thousands of XML part files one by one and then join them to single table and append all of them to get the final table for all those files.

Out of all these steps proc copy which we use to copy data from XML into SAS is the slowest, due to row by row operation insted of Bulk IO.
Patrick
Opal | Level 21

@Satish_Parida

O.K. So what you're doing is basically:

1. You've got two libnames, one for your source and one for your target library.

2. Your source library points to a folder containing XML files (and you're using the XMLV2 engine)

3. You use Proc Copy to copy your data from source to target.

 

What's most likely the bottleneck in this process is not the copy process as such but the parsing of the source XML.

 

Your current process reads, parses and copies one XML after the other (single threaded). If you want to speed up the process then change your code. 

1. Create a list of all XML files in the folder

2. Point the source libname to a specific XML instead of the folder containing the XML (dynamic based on the list created)

3. Implement parallel processing best via rsubmit on the same machine or if you don't have SAS/Connect licensed then via %SYSEXEC()

4. Run the child processes in parallel (1 per XML to be converted to a SAS table)

Satish_Parida
Lapis Lazuli | Level 10
Thank you, I think this will be the only way.
I had tried all other options; none affecting the behavior any way.
I think the only disadvantage of the parallel run will be high IO and CPU consuption, that I will check. Thank you.
Patrick
Opal | Level 21

@Satish_Parida

Servers are often not used to capacity but if that's a problem for you then limit the threads to something acceptable.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1593 views
  • 1 like
  • 4 in conversation