Hi everyone
I have a Microsoft SQL Table with a field which contains XML text. I need to parse the xml text contained in this field and output the result to another table in SAS Data Integration Studio.
Actually I managed to parse the xml from a file on disk using the xml libname engine:
libname in xml 'path\to\sample.xml';
data utente;
set in.utente;
run;
proc print data=utente;
run;
How can I extract the xml text from SQL table field and use that as an input for the libname xml engine and output the result to another table? I need to do that in a Data Integration job.
Thanks for any help!
You could write the content of that field to a temporary file and present that file to the XML libname engine:
filename tempxml temp;
data _null_;
set sqldb.mytable(obs=1);
file tempxml;
put sqlfield;
run;
libname myxml xml xmlfileref=tempxml;
I leave it to you to deal with mutliple rows in your database table, choosing between XML and XMLV2 engines, XML maps and the many many options that come with XML. But I hope this will give you a general direction.
Regards,
-- Jan.
Hi @jklaverstijn, thanks for your answer.
If I understand correctly, your solution writes the entire table column to a file on disk and then use the XML engine to parse that file. I think this will fail since the engine expects one single root node, while here there are many (each row has a separate xml document).
Also I would like to do it in memory, row by row, parse the field content and write to output without having to write the xml to disk. Is it possible?
I wouldn't know how to do it while avoiding going to disk. I have done similar stuff with JSON and and saw only little overhead from that compared to the expensive interpretation of the data. Your case may be different.Maybe PROC GROOVY and xmlslurper can help if that suits you. But I feel you move a burden from the machine to yourself as the code will be more complicated.
My example was a single-row use case (see the OBS=1). If you want to introduce that root element and have each row in your input table add a row to the final result that can be done easily (at the risk of oversimplifying your challenge):
filename myxml temp;
data _null_;
file myxml;
set mydb.mytable end=eod;
if _n_=1 then then put '<TABLE1)';
put xmlfield;
if eod then put '</TABLE1>';
run;
That way you wrap the individual rows in a table element. If you want more detailed guidance then we would need to see some sample data.
Hope this helps,
-- Jan.
Hi @jklaverstijn, thank you for the answer.
My problem with this solution is that adding another nested level to the xml causes the failure of the parsing XML engine, which it seems does not support more than 3 levels deep in the hierarchy (from the documentation "Document can only be three levels deep,
containing a root element, second-level elements corresponding to the table to be read in, and third-level elements
corresponding to the variables").
Do you think is possibile to write a procedure like the following:
For each row in the table:
1) extract the xml text from the column (only for the current row)
2) write the xml to a file on disk
3) load and parse the file with the sas xml engine
4) write the result to a target table
In this way I won't need to add level to the hierarchy and the xml engine should be to parse the xml row by row.
Apologies I am not very good at SAS syntax thus I am not sure how to properly write in SAS code the procedure above.
Thanks for the support
@lfabbri wrote:
Hi @jklaverstijn, thank you for the answer.
My problem with this solution is that adding another nested level to the xml causes the failure of the parsing XML engine, which it seems does not support more than 3 levels deep in the hierarchy (from the documentation "Document can only be three levels deep,
containing a root element, second-level elements corresponding to the table to be read in, and third-level elements
corresponding to the variables").
Do you think is possibile to write a procedure like the following:
For each row in the table: 1) extract the xml text from the column (only for the current row) 2) write the xml to a file on disk 3) load and parse the file with the sas xml engine 4) write the result to a target table
If you manage to write the necessary code for one of the rows in your source table then with DIS you can use the LOOP transformation and iterate over your code once per row in your source table.
Below works for my very simple sample XML structures.If that's not suitable for your data them may be modify the data step creating the sample XMLs to better reflect what you're dealing with.
As for DIS implementation: If you can come up with a solution which works for a single row then just use your source table as control table of a Loop transformation and then iterate over the code extracting the XMLs row by row.
data have;
do id=1 to 2;
xml_col='
<?xml version="1.0" encoding="windows-1252" ?>
<TABLE>
<TEST>
<Id>'||cats(id)||'</Id>
<Name>Alfred</Name>
<Age>14</Age>
</TEST>
<TEST>
<Id>'||cats(id)||'</Id>
<Name>Alice</Name>
<Age>13</Age>
</TEST>
</TABLE>';
xml_col=compbl(xml_col);
output;
end;
stop;
run;
filename source 'c:\test\test3.xml';
libname source xmlv2 xmlfileref=source;
data _null_;
set have;
file source;
put xml_col;
run;
libname source clear;
filename source clear;
libname source xmlv2 'c:\test\test3.xml';
data want;
set source.test;
run;
libname source clear;
Hi @Patrick, thank you for the answer.
My xml data need to be extracted from column records in a sql table:
libname dbtest sqlsvr
noprompt="uid=user;
pwd=password;
dsn=DBTEST;"
stringdates=yes schema=DBO;
filename tempfile "C:\SAS\TEMP\temp.xml";
data _null_;
set dbtest.MY_XML_TABLE(firstobs=1 obs=1);
file tempfile;
put XML;
run;
This code can read and write one specific XML record to file.
I am not sure how to apply your code to my use-case. Your loop "do id=1 to 2" iterates only on the first 2 rows?
"Your loop "do id=1 to 2" iterates only on the first 2 rows"
That's for creating sample data and not how to read this data. If the data is in a SQL server table or in a SAS table doesn't really matter.
"I need to do that in a Data Integration job."
If you're talking about using SAS DI Studio for implementation then do you understand how the out-of-the-box LOOP transformation works? If you've got a single row solution the LOOP transformation allows you to iterate over your source row by row.
It will not fail as you treat every row as a separate document.
IMHO, you have two options here:
1) You loop over every record (I would use macro for that) and use the XML engine to parse out each individual value as a seperate document and append that to a target table.
2) "Stitch" all the individual values by adding some XML up front and at the end. I would need to know your XML data to figure out how that would look like.
The macro approach for option 1 is moderately advanced but it would be a great learning experience.
Look on the web for macro processing of every row in a table. It encompasses creating a macro that processes a single XML string where that string is a variable. Than, using a datastep to go through the source table, use call execute() to call that macro for every row.
SAS macro's are very powerful. Allow yourself the satisfaction of learning to use them for your specific use case. It will be very gratifying I promise.
Hope this helps,
-- Jan.
Why?
Surely if your remote database allows you to store XML into fields then it also has tools for parsing the XML.
Why not just have the remote database parse the XML and return the fields you need from it?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.