I have an xml file with a variable that runs over 100.000 bytes. I would like it to overflow into several variables. Any ideas on how to do that?
If you are able to import the variable into sas, I would suggest using the SCAN function to break it up!
Here is a similar post with some great examples:
https://stackoverflow.com/questions/47693983/spliting-a-variable-into-multiple-in-sas
The problem is how to import such a long variable
You will have to dissect that data on your own. How you can do that will depend on the layout of the file; if that XML variable is contained on its own single line, in a separate block, or somehow contained within an unbroken stream of data.
Hi Kurt.
Thanks for your comment. I have given an example on the data structure here. It is this part
<TextContent>
very loon text...
</TextContent>
that I wan't to import. The text is a resume in plain text.
<CandidateList xmlns="http://schemas.hr-manager.net/restful/2.0/" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> <DocumentList> <Document> <Id>13289823</Id> <Name>CV name</Name> <Extension>pdf</Extension> <ByteCount>186787</ByteCount> <DownloadUrl> https://cdn-recruiter.hr-manager.net/Export/Attachments/ViewDocument.aspx?cid=342&q=KQAAWqKhQYsY6i8aT%2byG16dhiBatjCa7wqDpBB%2flEupzmp3BQQ63DZ41fcyCwWN0 </DownloadUrl> <Bytes i:nil="true"/> <Type>Cv</Type> <TextContent> very loon text... </TextContent> </Document> </DocumentList> </CandidateList>
See a very simplified example for how I would tackle such an issue:
%let len=20;
data test;
infile datalines truncover;
retain id;
length id 8 text $&len.;
input tag $30.;
select (tag);
when ("<id>") do;
input id;
input;
end;
when ('<textcontent>') do;
input text $&len..@;
do until (text = '</textcontent>');
output;
input text $&len..@;
if lengthn(text) = 0 then do;
input;
input text $&len..@;
end;
end;
input;
end;
end;
drop tag;
datalines;
<id>
1
</id>
<textcontent>
xxxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyzzzzzzzzzzzzzz
</textcontent>
;
Hi Kurt.
Thanks for yor reply. Forgot to add that the data in textContent is multilined.
<id> 1 </id> <textcontent>CV Hi my name is John. I like to work with sas data. You can call me on my phone 445-334-566 See you. John </textcontent>
I have updated the code so that it deals correctly with data that follows the tag on the same line.
%let len=20;
data test;
infile datalines truncover;
retain id;
length id 8 text $&len.;
input line $30.;
tag = scan(line,1,">") !! ">";
line = scan(line,2,">");
select (tag);
when ("<id>") do;
input id;
input;
end;
when ('<textcontent>') do;
if lengthn(line) > 0
then do;
text = line;
output;
end;
input text $&len..@;
do until (text = '</textcontent>');
output;
input text $&len..@;
if lengthn(text) = 0 then do;
input;
input text $&len..@;
end;
end;
input;
end;
end;
drop tag;
datalines;
<id>
1
</id>
<textcontent>CV
Hi my name is John.
I like to work with sas data.
You can call me on my phone 445-334-566
See you.
John
</textcontent>
;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.