Find below your code with some additional statements to get the values between tags and get the tag name, it might be a starting point. Writing the XML to a file and then use the XML libname engine with a MAP file is maybe a lot more flexible and less code maintenance. You could add a key defintion to the XML string, so that you later can join the original data with the data read from the XML file. /* * this regular expression should match any XML tag * minimum length of tag is 2 /(<.[^(><.)]+>)/ */ data have; length notes $ 32767; notes = '<notes><aa>aacaa</aa><note><date domain="SVR_DATETIME">20140926T113458</date><timeZone>Pacific/Auckland</timeZone><author username="">Jane Doe</author><content>Note One Content Here</content></note><note><date domain="SVR_DATETIME">20141030T133030</date><timeZone>Pacific/Auckland</timeZone><author username="">Joe Bloggs</author><content>Note Two Content In Here</content></note><note><date domain="SVR_DATETIME">20141117T135213</date><timeZone>New Zealand Standard Time</timeZone><author username="">Joe Public</author><content>Note Three yada yada</content></note><note><date domain="SVR_DATETIME">20141117T152544</date><timeZone>New Zealand Standard Time</timeZone><author username="">Sally Citizen</author><content>Note4 Could be really long</content></note></notes>'; * notes = '<notes displayAsDescending="yes" showtimezone="no"><note><date domain="SVR_DATETIME">20140926T113458</date><timeZone>Pacific/Auckland</timeZone><author username="">Jane Doe</author><content>Note One Content Here</content></note><note><date domain="SVR_DATETIME">20141030T133030</date><timeZone>Pacific/Auckland</timeZone><author username="">Joe Bloggs</author><content>Note Two Content In Here</content></note><note><date domain="SVR_DATETIME">20141117T135213</date><timeZone>New Zealand Standard Time</timeZone><author username="">Joe Public</author><content>Note Three yada yada</content></note><note><date domain="SVR_DATETIME">20141117T152544</date><timeZone>New Zealand Standard Time</timeZone><author username="">Sally Citizen</author><content>Note4 Could be really long</content></note></notes>'; regxID = prxparse('/(<.[^(><.)]+>)/'); start = 1; stop = length(notes); call prxnext(regxID, start, stop, notes, position, length); prior_position = 1; do while (position > 0); tag = prxposn(regxID, 1, notes); prevPos = prior_position; prior_position = position + length; endTag = ( tag =: "</" ); if endTag = 1 then do; valueLen = position - prevPos; if valueLen > 0 then do; length tagValue $ 128; tagValue = substr(notes, prevPos, valueLen); tagName = scan(tag, 2, "/>"); end; end; output; call missing( valueLen, tagValue, tagName ); call prxnext(regxID, start, stop, notes, position, length); end; run;
... View more