03-11-2017 10:23 PM
I want to be able to see if the types of data we are sending changes over time.
I have a series of records with a range of namespaces that are sent.
1 <keyData siteId="1" id="" lastName="Caser" />
2 <keyData siteId="1" email="email@example.com" />
Sometimes I'll have ID, sometimes I'll have ID, Email, and SSN -- it varies by the record.
Does anyone have experience trying to shred out XML using SAS?
03-12-2017 06:40 AM
data have; input Record XMLData $50.; cards; 1
2 ; run; data want; set have; pid=prxparse('/\w+(?==")/o'); s=1;e=length(xmldata); call prxnext(pid,s,e,xmldata,p,l); do while(p gt 0); type=substr(xmldata,p,l);output; call prxnext(pid,s,e,xmldata,p,l); end; drop pid s e p l; run;
03-12-2017 03:09 PM
In your example each line is a set of space-separated words. The last word of each is "/>" and you apparently want a part of the next-to-last word, namely the part to the left of the = sign. You also want an output record with type='ID' for each incoming record number, regardless of content.
This code is not "xml-aware" in any way, but it does the particular task as you have described it:
data want (keep=record type); input record xmltext &$80.; length type $20; type='ID'; output; next_to_last_word=scan(xmltext,-2,' '); type=scan(next_to_last_word,1,'='); output; datalines; 1 <keyData siteId="1" id="" lastName="Caser" /> 2 <keyData siteId="1" email="firstname.lastname@example.org" /> run;