Hi, I am trying to remove a part of text in my xml file. As it is a massive string, I will just show a part of it. I wanted to remove the characters in BOLD so it just looks like this:
<Row >
<Cell <Data ss:Type="String"><interview><rulebook><name>ALE_UW</name><version>ALEX UW - 1.1</version></rulebook><createdDate>2017-10-23 13:51:30.554
TURN IT INTO
<interview><rulebook><name>ALE_UW</name><version>ALEX UW - 1.1</version></rulebook><createdDate>2017-10-23 13:51:30.554
The code I have for it right now (i got it from the internet) looks like this:
data Replace ;
FILE "D:\XMLCol_Replace.xml" RECFM=N;
infile "D\XMLCol_TRY.xml" RECFM=N;
input VAR1 $CHAR1. @;
IF VAR1 eq '<Row > *this is a carriage return which needs to be deleted too; * it only has one variable (which is the string itself);
<Cell <Data ss:Type="String">' THEN
input VAR1 $CHAR1. @;
*end;
put '';
*end;
*else PUT VAR1 $CHAR1. ;
run;
I'd appreciate any pointers, thank you.
It would be easy if you only want delete these two words and your file is NOT stream file.
data Replace ;
FILE "D:\XMLCol_Replace.xml";
infile "D\XMLCol_TRY.xml";
input ;
IF _infile_ =: '<Row > ' then delete;
if _infile_ =: '<Cell <Data ss:Type="String">' THEN _infile_=substr(_infile_,40);
put _infile_;
run;
Am just looking back through your questions, to be honest I think the best course of action for you is to get a proper XML editor. SAS isn't really the tool to be doing XML editing, it is used for manipulating and analysing datasets. There are a lot of free XML editors out there, and a lot of paid ones, which will make editing, updating mixing up XML files a lot easier as they are built specifically for the datafile type.
RW9, I know man...
But this is an assignment for my class and for some reason these parts must be done via SAS. Apparently then I need to be able to deal with this with more observations (more similar strings). Which I'm not sure if XML editor can be efficient if it has to be done manually while observations are in the hundreds... If i'm wrong please let me know. Thank you.
I've downloaded xmlnotepad to have a go already.
Can you post the actual XML file? Also, where did you get the file, normally XML files would have end of line characters, and its likely that if you got it from a unix based system to windows, or vice versa the terminating characters are different. I.e. you should be able to read a line, and check it against your rules, then write it out.
It would be easy if you only want delete these two words and your file is NOT stream file.
data Replace ;
FILE "D:\XMLCol_Replace.xml";
infile "D\XMLCol_TRY.xml";
input ;
IF _infile_ =: '<Row > ' then delete;
if _infile_ =: '<Cell <Data ss:Type="String">' THEN _infile_=substr(_infile_,40);
put _infile_;
run;
If you just remove the part
<Row > <Cell <Data ss:Type="String">
then the XML file still contains the closing tags for </Cell> and </Row>, your xml file is not well formatted, which makes it more difficult for further processing.
Maybe it is better to use the XMLV2 libname engine with the proper map file to process the file.
SAS also provides Proc XSL which allows you to work with a XSLT to change your XML file to a different structure
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.