How to Read XML with .txt extension

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

How to Read XML with .txt extension

[ Edited ]

Hi,

 

I was told the dataset is an ASCII text file and it is a composite of XML records. I am confused about it. It's not like XML file and I failed to read by:

proc import datafile = 'data.txt' out=mydata dbms=dlm replace;
getnames=yes;
run;

 

The first two lines:

 

<personExport><personId>123456</personId><Id>123456</Id><XmlRecord><entityObject> <person eprsId="123456" eprsVersion="0" moduleId="109"> <addresses> <address locale="9e" type="yzN5D"> <stAddr1>4@Qd!qa4NW2@0-Eoe425</stAddr1> <stAddr2>au@Uk</stAddr2> <city>aZ?7zjWH?p</city> <state>nb</state> <postalCode>7LA@8SQd?e</postalCode> <country>pTWWihi-xs</country> </address> </addresses> <disabilityCodes> <disabilityCode>466</disabilityCode> <disabilityCode>802</disabilityCode> </disabilityCodes> <name> <family>iNCkfiONdU</family> <given>4-rOv9gdtG</given> <middle>e</middle>  <suffix>95XRA2id@t</suffix> </name> <message>None</message> </person></entityObject></XmlRecord></personExport>

 

<encounterExport><encounterId>abcdefg123</encounterId><personId>123456</personId><Id>abcdefg123</Id><XmlRecord><entityObject personURI="/11777/person/d434b5bf.xml"> <encounter eprsId="abcdefg123"  eprsVersion="0" moduleId="109" personId="d434b5bf"> <admission> <admitDate>2020-12-05T14:25:11.2277943-07:00</admitDate>   <referralSourceCode>77270</referralSourceCode>  </admission> </encounter></entityObject></XmlRecord><PersonId>123456</PersonId></encounterExport>

 

So far I know it's hierarchical and each line begins with <personExport> or <encounterExport> or others. Also, the personId are same. I don't have XML map either. 

 

I'm not sure SAS can read it or not. It is not XML file, but it has XML records. 

 

Any thoughts will be appreciated! 


Accepted Solutions
Solution
‎09-20-2017 03:12 PM
Frequent Contributor
Posts: 134

Re: How to Read XML with .txt extension

Ok, it is not an XML file then, simple enough. It is a collection of XML snippets. 

 

I normally would use C# and transform it since it can handle the way that this is structured. However, in the SAS world, some suggestions:

 

1. See if you can append the XML identifier to the top and include a wrapping element such as <Records> at the top and close it out at the bottom. That would make it a 'valid' XML file. Still doesn't solve your issue but gets it so you can read it into SAS easier.

 

2. Rip it using a data step. Ignore the fact that it is XML wrapped records and just use regex to parse it out. Read a record, parse it, output to dataset.

 

This is a format I have seen before but I take it elsewhere to handle. If you want to keep it in a SAS world 100%, you probably have to unwind it yourself.

View solution in original post


All Replies
Super User
Posts: 21,464

Re: How to Read XML with .txt extention

It's an XML file, the extension doesn't really matter, they're all text files anyways the extension tells you what's inside (csv, json, xml, html). 

 

Look at the libname XML.

 

http://support.sas.com/resources/papers/proceedings12/253-2012.pdf

Occasional Contributor
Posts: 10

Re: How to Read XML with .txt extention

Thanks for your quick reply!

I read this paper and tried the programs before, there are errors:

17136  data dat.data;
17137   set myxml.data;
ERROR: Expected a comment or processing instruction.
ERROR: Encountered during XMLMap parsing at or near line 2, column 1.
ERROR: XML describe error: Internal processing error.
17138  run;

My most concern is, the text file doesn't have:

<?xml version="1.0" encoding="windows-1252" ?> 

Therefore I didn't consider it as XML file. Please correct me if I were wrong. Thanks!

Super User
Posts: 21,464

Re: How to Read XML with .txt extention

Where are the libname statements?

Occasional Contributor
Posts: 10

Re: How to Read XML with .txt extention

libname myxml xml 'Z:\data.txt';
LIBNAME DAT "C:\mylibrary";

There are no issues with libname. Thanks.

Super User
Posts: 21,464

Re: How to Read XML with .txt extention

Can you please post, a text file that includes exactly what you have, mabye 10 records? You can mask any fields that are needed and then post the remaining content. It really helps to have the actual file to work with.

 

SAS can process XML files and this is XML. 

 

What version of SAS do you have, the XML libraries are relatively new. 

Frequent Contributor
Posts: 134

Re: How to Read XML with .txt extension

I can't say whether it is an XML file (or not). It may have records which contain XML, not the same as an XML file.

 

Does the file have an XML identifier as the very first line in the file? Something like this:

 

<?xml version="1.0" encoding="utf-16"?>

 

It also must have a single grouping element at the top.

 

Can you send us the top 10 lines in the file?

 

 

 

Occasional Contributor
Posts: 10

Re: How to Read XML with .txt extension

No, it does not have something like: <?xml version="1.0" encoding="utf-16"?> This is one of my concerns as well.

 

The first two lines I have already posted, that's what I have.  I just truncated some details and I didn't change the structure of the data.

 

I am not sure if the file can be imported by SAS. If not, it's fine, I just need to make sure that.

 

Thanks!

 

Solution
‎09-20-2017 03:12 PM
Frequent Contributor
Posts: 134

Re: How to Read XML with .txt extension

Ok, it is not an XML file then, simple enough. It is a collection of XML snippets. 

 

I normally would use C# and transform it since it can handle the way that this is structured. However, in the SAS world, some suggestions:

 

1. See if you can append the XML identifier to the top and include a wrapping element such as <Records> at the top and close it out at the bottom. That would make it a 'valid' XML file. Still doesn't solve your issue but gets it so you can read it into SAS easier.

 

2. Rip it using a data step. Ignore the fact that it is XML wrapped records and just use regex to parse it out. Read a record, parse it, output to dataset.

 

This is a format I have seen before but I take it elsewhere to handle. If you want to keep it in a SAS world 100%, you probably have to unwind it yourself.

New User
Posts: 1

Re: How to Read XML with .txt extension

To read XML you can rename the file and open in browser. These XML tools may help to read and beautify XML.

  1. https://jsonformatter.org/xml-formatter
  2. https://codebeautify.org/xmlviewer 
Respected Advisor
Posts: 4,274

Re: How to Read XML with .txt extension

@wy110

Looks to me like XML formatted messages or something in this area.

The XML structures in your two examples are not as simple as you believe. That's not a simple table.

 

Below sample code which reads the first of your message. The result are 9 tables.

 

You will have to run this message by message and without the XSD it's going to take a bit to bring the data together.

data _null_;
  file tmp;
  infile datalines truncover;
  input;
  put _infile_;
  datalines;
<personExport>
  <personId>123456</personId>
  <Id>123456</Id>
  <XmlRecord>
    <entityObject> 
      <person eprsId="123456" eprsVersion="0" moduleId="109"> 
        <addresses> 
          <address locale="9e" type="yzN5D"> 
            <stAddr1>4@Qd!qa4NW2@0-Eoe425</stAddr1> 
            <stAddr2>au@Uk</stAddr2> 
            <city>aZ?7zjWH?p</city> 
            <state>nb</state> 
            <postalCode>7LA@8SQd?e</postalCode> 
            <country>pTWWihi-xs</country> 
          </address> 
        </addresses> 
        <disabilityCodes> 
          <disabilityCode>466</disabilityCode> 
          <disabilityCode>802</disabilityCode> 
        </disabilityCodes> 
        <name> 
          <family>iNCkfiONdU</family> 
          <given>4-rOv9gdtG</given> 
          <middle>e</middle>  
          <suffix>95XRA2id@t</suffix> 
        </name> 
        <message>None</message> 
      </person>
    </entityObject>
  </XmlRecord>
</personExport>
;
run;

filename map temp;
libname test xmlv2 "%sysfunc(pathname(tmp))" automap=replace xmlmap="%sysfunc(pathname(map))";

Capture.JPG

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 250 views
  • 0 likes
  • 5 in conversation