BookmarkSubscribeRSS Feed
lfabbri
Obsidian | Level 7

Hi everyone

 

I have a Microsoft SQL Table with a field which contains XML text. I need to parse the xml text contained in this field and output the result to another table in SAS Data Integration Studio.

 

Actually I managed to parse the xml from a file on disk using the xml libname engine:

 

libname in xml 'path\to\sample.xml';

data utente;
 set in.utente;
run;

proc print data=utente;
run;

How can I extract the xml text from SQL table field and use that as an input for the libname xml engine and output the result to another table? I need to do that in a Data Integration job.

 

Thanks for any help!

10 REPLIES 10
jklaverstijn
Rhodochrosite | Level 12

You could write the content of that field to a temporary file and present that file to the XML libname engine:

 

filename tempxml temp;

data _null_;
   set sqldb.mytable(obs=1);
   file tempxml;
   put sqlfield;
run;

libname myxml xml xmlfileref=tempxml;

I leave it to you to deal with mutliple rows in your database table, choosing between XML and XMLV2 engines, XML maps and the many many options that come with XML. But I hope this will give you a general direction.

 

Regards,

-- Jan.

lfabbri
Obsidian | Level 7

Hi @jklaverstijn, thanks for your answer.

 

If I understand correctly, your solution writes the entire table column to a file on disk and then use the XML engine to parse that file. I think this will fail since the engine expects one single root node, while here there are many (each row has a separate xml document).

 

Also I would like to do it in memory, row by row, parse the field content and write to output without having to write the xml to disk. Is it possible?

jklaverstijn
Rhodochrosite | Level 12

I wouldn't know how to do it while avoiding going to disk. I have done similar stuff with JSON and and saw only little overhead from that compared to the expensive interpretation of the data. Your case may be different.Maybe PROC GROOVY and xmlslurper can help if that suits you. But I feel you move a burden from the machine to yourself as the code will be more complicated.

 

My example was a single-row use case (see the OBS=1). If you want to introduce that root element and have each row in your input table add a row to the final result that can be done easily (at the risk of oversimplifying your challenge):

 

filename myxml temp;
data _null_;
   file myxml;
   set mydb.mytable end=eod;
   if _n_=1 then then put '<TABLE1)';
   put xmlfield;
   if eod then put '</TABLE1>';
run;

That way you wrap the individual rows in a table element. If you want more detailed guidance then we would need to see some sample data.

 

Hope this helps,

-- Jan.

lfabbri
Obsidian | Level 7

Hi @jklaverstijn, thank you for the answer.

 

My problem with this solution is that adding another nested level to the xml causes the failure of the parsing XML engine, which it seems does not support more than 3 levels deep in the hierarchy (from the documentation "Document can only be three levels deep,
containing a root element, second-level elements corresponding to the table to be read in, and third-level elements
corresponding to the variables").

 

Do you think is possibile to write a procedure like the following:

 

 

For each row in the table:

    1) extract the xml text from the column (only for the current row)
    2) write the xml to a file on disk
    3) load and parse the file with the sas xml engine
    4) write the result to a target table

In this way I won't need to add level to the hierarchy and the xml engine should be to parse the xml row by row.

Apologies I am not very good at SAS syntax thus I am not sure how to properly write in SAS code the procedure above.

 

Thanks for the support

Patrick
Opal | Level 21

@lfabbri wrote:

Hi @jklaverstijn, thank you for the answer.

 

My problem with this solution is that adding another nested level to the xml causes the failure of the parsing XML engine, which it seems does not support more than 3 levels deep in the hierarchy (from the documentation "Document can only be three levels deep,
containing a root element, second-level elements corresponding to the table to be read in, and third-level elements
corresponding to the variables").

 

Do you think is possibile to write a procedure like the following:

 

 

For each row in the table:

    1) extract the xml text from the column (only for the current row)
    2) write the xml to a file on disk
    3) load and parse the file with the sas xml engine
    4) write the result to a target table

 


 

If you manage to write the necessary code for one of the rows in your source table then with DIS you can use the LOOP transformation and iterate over your code once per row in your source table.

Patrick
Opal | Level 21

@lfabbri

Below works for my very simple sample XML structures.If that's not suitable for your data them may be modify the data step creating the sample XMLs to better reflect what you're dealing with.

 

As for DIS implementation: If you can come up with a solution which works for a single row then just use your source table as control table of a Loop transformation and then iterate over the code extracting the XMLs row by row.

data have;
  do id=1 to 2;
    xml_col='
      <?xml version="1.0" encoding="windows-1252" ?>
      <TABLE>
      <TEST>
      <Id>'||cats(id)||'</Id>      
      <Name>Alfred</Name>
      <Age>14</Age>
      </TEST>
      <TEST>
      <Id>'||cats(id)||'</Id>  
      <Name>Alice</Name>
      <Age>13</Age>
      </TEST>
      </TABLE>';
    xml_col=compbl(xml_col);
    output;
  end;
  stop;
run;

filename source 'c:\test\test3.xml';
libname source xmlv2 xmlfileref=source;
data _null_;
  set have;
  file source;
  put xml_col;
run;
libname source clear;
filename source clear;

libname source xmlv2 'c:\test\test3.xml';
data want;
  set source.test;
run;
libname source clear;

 

lfabbri
Obsidian | Level 7

Hi @Patrick, thank you for the answer.

 

My xml data need to be extracted from column records in a sql table:

 

 

libname dbtest sqlsvr
   noprompt="uid=user;
   pwd=password;
   dsn=DBTEST;"
   stringdates=yes schema=DBO;


filename tempfile "C:\SAS\TEMP\temp.xml";

data _null_;
   set dbtest.MY_XML_TABLE(firstobs=1 obs=1);
   file tempfile;
   put XML;
run;

 

This code can read and write one specific XML record to file.

I am not sure how to apply your code to my use-case. Your loop "do id=1 to 2" iterates only on the first 2 rows?

Patrick
Opal | Level 21

@lfabbri

"Your loop "do id=1 to 2" iterates only on the first 2 rows"

That's for creating sample data and not how to read this data. If the data is in a SQL server table or in a SAS table doesn't really matter.

 

"I need to do that in a Data Integration job."

If you're talking about using SAS DI Studio for implementation then do you understand how the out-of-the-box LOOP transformation works? If you've got a single row solution the LOOP transformation allows you to iterate over your source row by row.

 

jklaverstijn
Rhodochrosite | Level 12

It will not fail as you treat every row as a separate document.

 

IMHO, you have two options here:

 

1) You loop over every record (I would use macro for that) and use the XML engine to parse out each individual value as a seperate document and append that to a target table.

2) "Stitch" all the individual values by adding some XML up front and at the end. I would need to know your XML data to figure out how that would look like.

 

The macro approach for option 1 is moderately advanced but it would be a great learning experience.

 

Look on the web for macro processing of every row in a table. It encompasses creating a macro that processes a single XML string where that string is a variable. Than, using a datastep to go through the source table, use call execute() to call that macro for every row.

 

SAS macro's are very powerful. Allow yourself the satisfaction of learning to use them for your specific use case. It will be very gratifying I promise.

 

Hope this helps,

-- Jan.

 

 

Tom
Super User Tom
Super User

Why?

Surely if your remote database allows you to store XML into fields then it also has tools for parsing the XML. 

Why not just have the remote database parse the XML and return the fields you need from it?

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 4607 views
  • 3 likes
  • 4 in conversation