I work with SAS Data Integration Studio and need extract multiple addresses from text field which contains xml. For security reasons my SAS enviroment is quite closed, I cant use PIPE or FINDNTH(). I dont have access to original XML file aswell.
So far I managed with simple functions as SUBSTR and INDEX to shorten original text field and managed to get first address from textfield but as there more addresses I cant find solution for extracting second or sixth or n-th addresses.
My table looks like and count is how many strings <movementOfGoods> are in text field as how many rows I need for ID.
XML field contains:
<movementOfGoods>
<qualifierOfTheIdentification>Z</qualifierOfTheIdentification>
<legalBaseCode>A</legalBaseCode>
<holderEORINumber>
<eORINumber>EEReplacedCodeForSecurityReasons</eORINumber>
</holderEORINumber>
<country>EE</country>
<typeOfLocationCode>B</typeOfLocationCode>
<freeTextDescription>
<name>ReplacedCodeForSecurityReasons/001</name>
<streetAndNumber>Tähe tn 106a</streetAndNumber>
<postcode>50107</postcode>
<city>Tartu</city>
<country>EE</country>
</freeTextDescription>
</movementOfGoods>
<movementOfGoods>
<qualifierOfTheIdentification>Z</qualifierOfTheIdentification>
<legalBaseCode>A</legalBaseCode>
<holderEORINumber>
<eORINumber>EEReplacedCodeForSecurityReasons</eORINumber>
</holderEORINumber>
<country>EE</country>
<typeOfLocationCode>B</typeOfLocationCode>
<freeTextDescription>
<name>ReplacedCodeForSecurityReasons/007</name>
<streetAndNumber>Jaama pst 26</streetAndNumber>
<postcode>44311</postcode>
<city>Rakvere</city>
<country>EE</country>
</freeTextDescription>
</movementOfGoods>
<movementOfGoods>
<qualifierOfTheIdentification>Z</qualifierOfTheIdentification>
<legalBaseCode>A</legalBaseCode>
<holderEORINumber>
<eORINumber>EEReplacedCodeForSecurityReasons</eORINumber>
</holderEORINumber>
<country>EE</country>
<typeOfLocationCode>B</typeOfLocationCode>
<freeTextDescription>
<name>ReplacedCodeForSecurityReasons/008</name>
<streetAndNumber>Linnase tn 6</streetAndNumber>
<postcode>44208</postcode>
<city>Rakvere vald, Aluvere</city>
<country>EE</country>
</freeTextDescription>
</movementOfGoods>
<movementOfGoods>
<qualifierOfTheIdentification>Z</qualifierOfTheIdentification>
<legalBaseCode>A</legalBaseCode>
<holderEORINumber>
<eORINumber>EEReplacedCodeForSecurityReasons</eORINumber>
</holderEORINumber>
<country>EE</country>
<typeOfLocationCode>B</typeOfLocationCode>
<freeTextDescription>
<name>ReplacedCodeForSecurityReasons/010</name>
<streetAndNumber>Kooli tn 10</streetAndNumber>
<postcode>76102</postcode>
<city>Rummu, Harjumaa</city>
<country>EE</country>
</freeTextDescription>
</movementOfGoods>
</temporaryStorageCustomsDecisionInformation>
</decisionRelatedInformation>
<customsDecisionStatus>01</customsDecisionStatus>
<customsDecisionStatusDate>2019-12-04</customsDecisionStatusDate>
<publicationAgreement>
<agreementForPublication>1</agreementForPublication>
</publicationAgreement>
</customsDecisionAuthorisationData>
At the end I need table which look like
QUALIFICATION_IDENTIFICATION streetAndNumber,city as address
EETSTEE1000EE-2019-INR53390 Tähe tn 106a, Tartu
EETSTEE1000EE-2019-INR53390 Jaama pst 26, Rakvere
EETSTEE1000EE-2019-INR53390 Linnase tn 6, Rakvere vald, Aluvere
EETSTEE1000EE-2019-INR53390 Kooli tn 10, Rummu, Harjumaa
Please help me out
Andres Veer
Here is a way to do it:
data want;
set have;
length streetAndNumber city $200;
pos=1;
do until(0);
pos=find(XML,'<movementOfGoods>',pos);
if pos=0 then leave;
streetAndNumber=scan(substr(XML,find(XML,'<streetAndNumber>',pos)),2,'><');
city=scan(substr(XML,find(XML,'<city>',pos)),2,'><');
output;
pos=find(XML,'</movementOfGoods>',pos);
end;
keep QUALIFICATION_IDENTIFICATION streetAndNumber city;
run;
Please try the below code , I assumed that the xml data is in a txt file
filename sam '~/sample.txt';
proc import datafile=sam out=sample dbms=tab replace;
getnames=no;
run;
data want;
set sample;
retain streetAndNumber;
if prxmatch('m/streetAndNumber/oi',var1)>0 then streetAndNumber=prxchange('s/(.*>)(.*)(<.*)/$2/oi',-1,var1);
if prxmatch('m/city/oi',var1)>0 then cityasaddress=prxchange('s/(.*>)(.*)(<.*)/$2/oi',-1,var1);
QUALIFICATION_IDENTIFICATION='EETSTEE1000EE-2019-INR53390';
if cityasaddress ne '';
drop var1;
run;
Here is a way to do it:
data want;
set have;
length streetAndNumber city $200;
pos=1;
do until(0);
pos=find(XML,'<movementOfGoods>',pos);
if pos=0 then leave;
streetAndNumber=scan(substr(XML,find(XML,'<streetAndNumber>',pos)),2,'><');
city=scan(substr(XML,find(XML,'<city>',pos)),2,'><');
output;
pos=find(XML,'</movementOfGoods>',pos);
end;
keep QUALIFICATION_IDENTIFICATION streetAndNumber city;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.