DATA Step, Macro, Functions and more

Fastest way of Importing XML

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 112
Accepted Solution

Fastest way of Importing XML

[ Edited ]

Hi Experts,

 

I am having little trouble tuning a import process from XML to SAS.

I am using Linux System with SAS 9.4 Installed. I am using XML mapper 9.43 for interpretation of the data.

 

For all XML tags we create tables and then sql join them using system generated ordinal columns to get the final table.

 

I am using proc copy to import all the XML tags to SAS tables.

 

The problem is in PROC COPY, as it is slow. In the log I can see following.

 

INFO: Data set block I/O cannot be used because:
INFO: - The data sets use different engines, have different variables or have attributes that may differ.

So Basically there is no use in increasing buffer size or bufno.

 

Could you please suggest any other practice that can boost the performance.

 

Note: I tried COMPRESS=NO that did not help.

 

Regards

Satish


Accepted Solutions
Solution
‎05-15-2018 01:19 AM
Respected Advisor
Posts: 4,779

Re: Fastest way of Copying XML

[ Edited ]
Posted in reply to Satish_Parida

@Satish_Parida

O.K. So what you're doing is basically:

1. You've got two libnames, one for your source and one for your target library.

2. Your source library points to a folder containing XML files (and you're using the XMLV2 engine)

3. You use Proc Copy to copy your data from source to target.

 

What's most likely the bottleneck in this process is not the copy process as such but the parsing of the source XML.

 

Your current process reads, parses and copies one XML after the other (single threaded). If you want to speed up the process then change your code. 

1. Create a list of all XML files in the folder

2. Point the source libname to a specific XML instead of the folder containing the XML (dynamic based on the list created)

3. Implement parallel processing best via rsubmit on the same machine or if you don't have SAS/Connect licensed then via %SYSEXEC()

4. Run the child processes in parallel (1 per XML to be converted to a SAS table)

View solution in original post


All Replies
Super User
Posts: 23,980

Re: Fastest way of Copying XML

Posted in reply to Satish_Parida

What do you mean by copying XML? XML are text files really so a copy can be really easy if all you want is another physical copy, via FCOPY() function. 

 

Or do you mean import an XML file instead, ie read/parse an XML file to a SAS data set?


@Satish_Parida wrote:

Hi Experts,

 

I am having little trouble tuning a copy process from XML to SAS.

I am using Linux System with SAS 9.4 Installed. I am using XML mapper 9.43 for interpretation of the data.

 

However in the log I can see following.

 

INFO: Data set block I/O cannot be used because:
INFO: - The data sets use different engines, have different variables or have attributes that may differ.

So Basically there is no use in increasing buffer size or bufno.

 

Could you please suggest any other practice that can boost the performance.

 

Note: I tried COMPRESS=NO that did not help.

 

Regards

Satish


 

Frequent Contributor
Posts: 112

Re: Fastest way of Copying XML

[ Edited ]

I mean importing XML data using a mapper.(Question Edited)
I am using PROC COPY now, which should be efficient for this work, but as BLOCK IO is not in work it is slow.

Super User
Super User
Posts: 8,272

Re: Fastest way of Copying XML

Posted in reply to Satish_Parida

We probably need a lot more detail about the file to provide any specific advice.

In general an XML file is not an efficient storage format, since it is normally very verbose. If you want speed do not use XML.

 

You might be able to find a non-SAS utility that could convert the XML into a more usable format, like a simple CSV file, that SAS can read more efficiently.

 

Frequent Contributor
Posts: 112

Re: Fastest way of Copying XML

1. I can not change the input source from XML to other.
2. I can not change the existing architecture, my job is to tune the process.
I am reading thousands of XML part files one by one and then join them to single table and append all of them to get the final table for all those files.

Out of all these steps proc copy which we use to copy data from XML into SAS is the slowest, due to row by row operation insted of Bulk IO.
Solution
‎05-15-2018 01:19 AM
Respected Advisor
Posts: 4,779

Re: Fastest way of Copying XML

[ Edited ]
Posted in reply to Satish_Parida

@Satish_Parida

O.K. So what you're doing is basically:

1. You've got two libnames, one for your source and one for your target library.

2. Your source library points to a folder containing XML files (and you're using the XMLV2 engine)

3. You use Proc Copy to copy your data from source to target.

 

What's most likely the bottleneck in this process is not the copy process as such but the parsing of the source XML.

 

Your current process reads, parses and copies one XML after the other (single threaded). If you want to speed up the process then change your code. 

1. Create a list of all XML files in the folder

2. Point the source libname to a specific XML instead of the folder containing the XML (dynamic based on the list created)

3. Implement parallel processing best via rsubmit on the same machine or if you don't have SAS/Connect licensed then via %SYSEXEC()

4. Run the child processes in parallel (1 per XML to be converted to a SAS table)

Frequent Contributor
Posts: 112

Re: Fastest way of Copying XML

Thank you, I think this will be the only way.
I had tried all other options; none affecting the behavior any way.
I think the only disadvantage of the parallel run will be high IO and CPU consuption, that I will check. Thank you.
Respected Advisor
Posts: 4,779

Re: Fastest way of Copying XML

Posted in reply to Satish_Parida

@Satish_Parida

Servers are often not used to capacity but if that's a problem for you then limit the threads to something acceptable.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 162 views
  • 1 like
  • 4 in conversation