I work with SAS Data Integration Studio and need extract multiple addresses from text field which contains xml. For security reasons my SAS enviroment is quite closed, I cant use PIPE or FINDNTH(). I dont have access to original XML file aswell.
So far I managed with simple functions as SUBSTR and INDEX to shorten original text field and managed to get first address from textfield but as there more addresses I cant find solution for extracting second or sixth or n-th addresses.
My table looks like and count is how many strings <movementOfGoods> are in text field as how many rows I need for ID.
XML field contains:
<movementOfGoods>
<qualifierOfTheIdentification>Z</qualifierOfTheIdentification>
<legalBaseCode>A</legalBaseCode>
<holderEORINumber>
<eORINumber>EEReplacedCodeForSecurityReasons</eORINumber>
</holderEORINumber>
<country>EE</country>
<typeOfLocationCode>B</typeOfLocationCode>
<freeTextDescription>
<name>ReplacedCodeForSecurityReasons/001</name>
<streetAndNumber>Tähe tn 106a</streetAndNumber>
<postcode>50107</postcode>
<city>Tartu</city>
<country>EE</country>
</freeTextDescription>
</movementOfGoods>
<movementOfGoods>
<qualifierOfTheIdentification>Z</qualifierOfTheIdentification>
<legalBaseCode>A</legalBaseCode>
<holderEORINumber>
<eORINumber>EEReplacedCodeForSecurityReasons</eORINumber>
</holderEORINumber>
<country>EE</country>
<typeOfLocationCode>B</typeOfLocationCode>
<freeTextDescription>
<name>ReplacedCodeForSecurityReasons/007</name>
<streetAndNumber>Jaama pst 26</streetAndNumber>
<postcode>44311</postcode>
<city>Rakvere</city>
<country>EE</country>
</freeTextDescription>
</movementOfGoods>
<movementOfGoods>
<qualifierOfTheIdentification>Z</qualifierOfTheIdentification>
<legalBaseCode>A</legalBaseCode>
<holderEORINumber>
<eORINumber>EEReplacedCodeForSecurityReasons</eORINumber>
</holderEORINumber>
<country>EE</country>
<typeOfLocationCode>B</typeOfLocationCode>
<freeTextDescription>
<name>ReplacedCodeForSecurityReasons/008</name>
<streetAndNumber>Linnase tn 6</streetAndNumber>
<postcode>44208</postcode>
<city>Rakvere vald, Aluvere</city>
<country>EE</country>
</freeTextDescription>
</movementOfGoods>
<movementOfGoods>
<qualifierOfTheIdentification>Z</qualifierOfTheIdentification>
<legalBaseCode>A</legalBaseCode>
<holderEORINumber>
<eORINumber>EEReplacedCodeForSecurityReasons</eORINumber>
</holderEORINumber>
<country>EE</country>
<typeOfLocationCode>B</typeOfLocationCode>
<freeTextDescription>
<name>ReplacedCodeForSecurityReasons/010</name>
<streetAndNumber>Kooli tn 10</streetAndNumber>
<postcode>76102</postcode>
<city>Rummu, Harjumaa</city>
<country>EE</country>
</freeTextDescription>
</movementOfGoods>
</temporaryStorageCustomsDecisionInformation>
</decisionRelatedInformation>
<customsDecisionStatus>01</customsDecisionStatus>
<customsDecisionStatusDate>2019-12-04</customsDecisionStatusDate>
<publicationAgreement>
<agreementForPublication>1</agreementForPublication>
</publicationAgreement>
</customsDecisionAuthorisationData>
At the end I need table which look like
QUALIFICATION_IDENTIFICATION streetAndNumber,city as address
EETSTEE1000EE-2019-INR53390 Tähe tn 106a, Tartu
EETSTEE1000EE-2019-INR53390 Jaama pst 26, Rakvere
EETSTEE1000EE-2019-INR53390 Linnase tn 6, Rakvere vald, Aluvere
EETSTEE1000EE-2019-INR53390 Kooli tn 10, Rummu, Harjumaa
Please help me out
Andres Veer
Here is a way to do it:
data want;
set have;
length streetAndNumber city $200;
pos=1;
do until(0);
pos=find(XML,'<movementOfGoods>',pos);
if pos=0 then leave;
streetAndNumber=scan(substr(XML,find(XML,'<streetAndNumber>',pos)),2,'><');
city=scan(substr(XML,find(XML,'<city>',pos)),2,'><');
output;
pos=find(XML,'</movementOfGoods>',pos);
end;
keep QUALIFICATION_IDENTIFICATION streetAndNumber city;
run;
Please try the below code , I assumed that the xml data is in a txt file
filename sam '~/sample.txt';
proc import datafile=sam out=sample dbms=tab replace;
getnames=no;
run;
data want;
set sample;
retain streetAndNumber;
if prxmatch('m/streetAndNumber/oi',var1)>0 then streetAndNumber=prxchange('s/(.*>)(.*)(<.*)/$2/oi',-1,var1);
if prxmatch('m/city/oi',var1)>0 then cityasaddress=prxchange('s/(.*>)(.*)(<.*)/$2/oi',-1,var1);
QUALIFICATION_IDENTIFICATION='EETSTEE1000EE-2019-INR53390';
if cityasaddress ne '';
drop var1;
run;
Here is a way to do it:
data want;
set have;
length streetAndNumber city $200;
pos=1;
do until(0);
pos=find(XML,'<movementOfGoods>',pos);
if pos=0 then leave;
streetAndNumber=scan(substr(XML,find(XML,'<streetAndNumber>',pos)),2,'><');
city=scan(substr(XML,find(XML,'<city>',pos)),2,'><');
output;
pos=find(XML,'</movementOfGoods>',pos);
end;
keep QUALIFICATION_IDENTIFICATION streetAndNumber city;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.