Hi,
I was told the dataset is an ASCII text file and it is a composite of XML records. I am confused about it. It's not like XML file and I failed to read by:
proc import datafile = 'data.txt' out=mydata dbms=dlm replace;
getnames=yes;
run;
The first two lines:
<personExport><personId>123456</personId><Id>123456</Id><XmlRecord><entityObject> <person eprsId="123456" eprsVersion="0" moduleId="109"> <addresses> <address locale="9e" type="yzN5D"> <stAddr1>4@Qd!qa4NW2@0-Eoe425</stAddr1> <stAddr2>au@Uk</stAddr2> <city>aZ?7zjWH?p</city> <state>nb</state> <postalCode>7LA@8SQd?e</postalCode> <country>pTWWihi-xs</country> </address> </addresses> <disabilityCodes> <disabilityCode>466</disabilityCode> <disabilityCode>802</disabilityCode> </disabilityCodes> <name> <family>iNCkfiONdU</family> <given>4-rOv9gdtG</given> <middle>e</middle> <suffix>95XRA2id@t</suffix> </name> <message>None</message> </person></entityObject></XmlRecord></personExport>
<encounterExport><encounterId>abcdefg123</encounterId><personId>123456</personId><Id>abcdefg123</Id><XmlRecord><entityObject personURI="/11777/person/d434b5bf.xml"> <encounter eprsId="abcdefg123" eprsVersion="0" moduleId="109" personId="d434b5bf"> <admission> <admitDate>2020-12-05T14:25:11.2277943-07:00</admitDate> <referralSourceCode>77270</referralSourceCode> </admission> </encounter></entityObject></XmlRecord><PersonId>123456</PersonId></encounterExport>
So far I know it's hierarchical and each line begins with <personExport> or <encounterExport> or others. Also, the personId are same. I don't have XML map either.
I'm not sure SAS can read it or not. It is not XML file, but it has XML records.
Any thoughts will be appreciated!
Ok, it is not an XML file then, simple enough. It is a collection of XML snippets.
I normally would use C# and transform it since it can handle the way that this is structured. However, in the SAS world, some suggestions:
1. See if you can append the XML identifier to the top and include a wrapping element such as <Records> at the top and close it out at the bottom. That would make it a 'valid' XML file. Still doesn't solve your issue but gets it so you can read it into SAS easier.
2. Rip it using a data step. Ignore the fact that it is XML wrapped records and just use regex to parse it out. Read a record, parse it, output to dataset.
This is a format I have seen before but I take it elsewhere to handle. If you want to keep it in a SAS world 100%, you probably have to unwind it yourself.
It's an XML file, the extension doesn't really matter, they're all text files anyways the extension tells you what's inside (csv, json, xml, html).
Look at the libname XML.
http://support.sas.com/resources/papers/proceedings12/253-2012.pdf
Thanks for your quick reply!
I read this paper and tried the programs before, there are errors:
17136  data dat.data;
17137   set myxml.data;
ERROR: Expected a comment or processing instruction.
ERROR: Encountered during XMLMap parsing at or near line 2, column 1.
ERROR: XML describe error: Internal processing error.
17138  run;My most concern is, the text file doesn't have:
<?xml version="1.0" encoding="windows-1252" ?>
Therefore I didn't consider it as XML file. Please correct me if I were wrong. Thanks!
Where are the libname statements?
libname myxml xml 'Z:\data.txt';
LIBNAME DAT "C:\mylibrary";There are no issues with libname. Thanks.
Can you please post, a text file that includes exactly what you have, mabye 10 records? You can mask any fields that are needed and then post the remaining content. It really helps to have the actual file to work with.
SAS can process XML files and this is XML.
What version of SAS do you have, the XML libraries are relatively new.
I can't say whether it is an XML file (or not). It may have records which contain XML, not the same as an XML file.
Does the file have an XML identifier as the very first line in the file? Something like this:
<?xml version="1.0" encoding="utf-16"?>
It also must have a single grouping element at the top.
Can you send us the top 10 lines in the file?
No, it does not have something like: <?xml version="1.0" encoding="utf-16"?> This is one of my concerns as well.
The first two lines I have already posted, that's what I have. I just truncated some details and I didn't change the structure of the data.
I am not sure if the file can be imported by SAS. If not, it's fine, I just need to make sure that.
Thanks!
Ok, it is not an XML file then, simple enough. It is a collection of XML snippets.
I normally would use C# and transform it since it can handle the way that this is structured. However, in the SAS world, some suggestions:
1. See if you can append the XML identifier to the top and include a wrapping element such as <Records> at the top and close it out at the bottom. That would make it a 'valid' XML file. Still doesn't solve your issue but gets it so you can read it into SAS easier.
2. Rip it using a data step. Ignore the fact that it is XML wrapped records and just use regex to parse it out. Read a record, parse it, output to dataset.
This is a format I have seen before but I take it elsewhere to handle. If you want to keep it in a SAS world 100%, you probably have to unwind it yourself.
To read XML you can rename the file and open in browser. These XML tools may help to read and beautify XML.
Looks to me like XML formatted messages or something in this area.
The XML structures in your two examples are not as simple as you believe. That's not a simple table.
Below sample code which reads the first of your message. The result are 9 tables.
You will have to run this message by message and without the XSD it's going to take a bit to bring the data together.
data _null_;
  file tmp;
  infile datalines truncover;
  input;
  put _infile_;
  datalines;
<personExport>
  <personId>123456</personId>
  <Id>123456</Id>
  <XmlRecord>
    <entityObject> 
      <person eprsId="123456" eprsVersion="0" moduleId="109"> 
        <addresses> 
          <address locale="9e" type="yzN5D"> 
            <stAddr1>4@Qd!qa4NW2@0-Eoe425</stAddr1> 
            <stAddr2>au@Uk</stAddr2> 
            <city>aZ?7zjWH?p</city> 
            <state>nb</state> 
            <postalCode>7LA@8SQd?e</postalCode> 
            <country>pTWWihi-xs</country> 
          </address> 
        </addresses> 
        <disabilityCodes> 
          <disabilityCode>466</disabilityCode> 
          <disabilityCode>802</disabilityCode> 
        </disabilityCodes> 
        <name> 
          <family>iNCkfiONdU</family> 
          <given>4-rOv9gdtG</given> 
          <middle>e</middle>  
          <suffix>95XRA2id@t</suffix> 
        </name> 
        <message>None</message> 
      </person>
    </entityObject>
  </XmlRecord>
</personExport>
;
run;
filename map temp;
libname test xmlv2 "%sysfunc(pathname(tmp))" automap=replace xmlmap="%sysfunc(pathname(map))";
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
