BookmarkSubscribeRSS Feed
wwelch
Calcite | Level 5

Hi,

 

I've have a lot of XML files that have "hidden" credit bureau data that I want to extract.  Each XML file represents a customer of ours that contains info like address info/employer info/deal structure info/etc. The credit bureau data is "hidden" in one of the attributes of one of the root elements.  This data is extremely useful to my company and I want to be able to extract it.

 

I tried importing the entire xml file (using XML mapper) into SAS but the credit bureau data gets truncated because it's much longer than the max characters aloud.  So, I can't even read the data into SAS to try to parse that field out later on.

 

Can someone give me some ideas on how to solve this issue?  Is there a way to read this field directly into SAS without the other surrounding data?  Not sure how to go about this.  Below is an example of the data I'm working with.    The raw_xml data is the "hidden" data (in bold) (Often over 100,000 characters).  Removed 99% of the data but just wanted to give you an idea of the structure that I'm working with. 

 

Hope this makes sense.  Appreciate any help/advice. 

 

I use Base SAS and xmlv2 to read in the data.

 

<DealDetails>
<cash_down="1000.00" net_trade="0.00" updated_by="" bundle_id=""/>
</DealDetails>
<Customers>
<age="28" create_date="2017-09-30T11:04:47.0000000">
<Reports>
<Report active="True" valid="True" bureau="TU" raw_xml="<Response Score="700"><creditBureau><document>response</document><version>2.18</version><transactionControl>< 
.
.
.
.
</printImage></product></creditBureau></Response>" create_date="" update_date="" updated_by="System [No User Available]"deal_detail_id="" bundle_id=""/>
</Reports>
5 REPLIES 5
Reeza
Super User

XML is just text. Read in and search the start/end tags -> <creditBureau> </creditBureau> and only keep the content in between the two. 

 

I think that's going to require using the _infile_ statement instead of XML mapper though. 

 

I would have thought the XML mapper would pull that out though, since it's not really hidden, it's between tags.

error_prone
Barite | Level 11
You need to extract the data of raw_xml, an attribute of <report>, save it to a new file and use an XML mapper on that file. To extract the text, use the method @Reeza described.
wwelch
Calcite | Level 5
So, I'm a little confused. I can't only read in the raw_xml attribute between the Reports tags? I have 15,000 xml files. I can't manually do this.
Reeza
Super User

@wwelch wrote:
. I can't only read in the raw_xml attribute between the Reports tags? 

Why not?

 I have 15,000 xml files. I can't manually do this.


That's why you write a program. Once you have it working for one you can figure out how to do it for 15000. I don't recall seeing a 'manual' suggestion anywhere in the answers. 

wwelch
Calcite | Level 5
Ok thanks Reeza!!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1829 views
  • 0 likes
  • 3 in conversation