DATA Step, Macro, Functions and more

Extract Hidden Data in XML file

Reply
New Contributor
Posts: 3

Extract Hidden Data in XML file

Hi,

 

I've have a lot of XML files that have "hidden" credit bureau data that I want to extract.  Each XML file represents a customer of ours that contains info like address info/employer info/deal structure info/etc. The credit bureau data is "hidden" in one of the attributes of one of the root elements.  This data is extremely useful to my company and I want to be able to extract it.

 

I tried importing the entire xml file (using XML mapper) into SAS but the credit bureau data gets truncated because it's much longer than the max characters aloud.  So, I can't even read the data into SAS to try to parse that field out later on.

 

Can someone give me some ideas on how to solve this issue?  Is there a way to read this field directly into SAS without the other surrounding data?  Not sure how to go about this.  Below is an example of the data I'm working with.    The raw_xml data is the "hidden" data (in bold) (Often over 100,000 characters).  Removed 99% of the data but just wanted to give you an idea of the structure that I'm working with. 

 

Hope this makes sense.  Appreciate any help/advice. 

 

I use Base SAS and xmlv2 to read in the data.

 

<DealDetails>
<cash_down="1000.00" net_trade="0.00" updated_by="" bundle_id=""/>
</DealDetails>
<Customers>
<age="28" create_date="2017-09-30T11:04:47.0000000">
<Reports>
<Report active="True" valid="True" bureau="TU" raw_xml="<Response Score="700"><creditBureau><document>response</document><version>2.18</version><transactionControl>< 
.
.
.
.
</printImage></product></creditBureau></Response>" create_date="" update_date="" updated_by="System [No User Available]"deal_detail_id="" bundle_id=""/>
</Reports>
Super User
Posts: 19,822

Re: Extract Hidden Data in XML file

XML is just text. Read in and search the start/end tags -> <creditBureau> </creditBureau> and only keep the content in between the two. 

 

I think that's going to require using the _infile_ statement instead of XML mapper though. 

 

I would have thought the XML mapper would pull that out though, since it's not really hidden, it's between tags.

Frequent Contributor
Posts: 149

Re: Extract Hidden Data in XML file

You need to extract the data of raw_xml, an attribute of <report>, save it to a new file and use an XML mapper on that file. To extract the text, use the method @Reeza described.
New Contributor
Posts: 3

Re: Extract Hidden Data in XML file

Posted in reply to error_prone
So, I'm a little confused. I can't only read in the raw_xml attribute between the Reports tags? I have 15,000 xml files. I can't manually do this.
Super User
Posts: 19,822

Re: Extract Hidden Data in XML file


wwelch wrote:
. I can't only read in the raw_xml attribute between the Reports tags? 

Why not?

 I have 15,000 xml files. I can't manually do this.


That's why you write a program. Once you have it working for one you can figure out how to do it for 15000. I don't recall seeing a 'manual' suggestion anywhere in the answers. 

New Contributor
Posts: 3

Re: Extract Hidden Data in XML file

Ok thanks Reeza!!!
Ask a Question
Discussion stats
  • 5 replies
  • 98 views
  • 0 likes
  • 3 in conversation