Hi guys,
I am really not good at using Perl Perl's regular expressions functions in data step... Below are several different expressions I wrote for the Prxchange function, and this is for searching for patterns and remove the <E T="XXX"><\E> tag but keep the content within this tag. I wonder what exactly is the difference between each expression and if any one can advise me on which one is the best (or even better -- propose a better one) that would be really awesome.
(XXX can be any integer)
data _null_;
A='<P><E T="12">Test </E>(a) <E T="03">Authority.</E> This part is issued pursuant to 12 U.S.C. 1 <E T="99">et seq.,</E> 12 U.S.C. 24 (Seventh), and 12 U.S.C. 93a.<E T="5">test</E></P>';
B=prxchange('s/(.+?)(<E.*?>(.+?)<\/E>)(.+?)/\1\3\4/',-1,A);
C=prxchange('s/(.+?)(<E T=.*?>(.+?)<\/E>)(.+?)/\1\3\4/',-1,A);
D=prxchange('s/(<E T=.*?>(.+?)<\/E>)(.+?)/\2\3/',-1,A);
put A=/B=C=D=;
run;
All of them generated desired results, but - none of them is completely satisfactory to me when I apply the function to the whole file( I have a large file to deal with, and this <E T="XXX"><\E> pattern can appear any where.
Thanks in advance!
Only specify the searched and replacement parts, the rest of the string will remain untouched
E=prxchange('s/<E T="\d+">(.*?)<\/E>/\1/', -1, A);
All these expressions do what you want and give the same result for your string.
The difference between B and C is that B capture any <E tag while C also needs T= inside.
D is the same as C except that C needs something before <E, so won't capture <E if it's at the start of the string.
> propose a better one
What is 'better"?
What's wrong with these?
If you want restrict the capture to <E T="{digits"}> you could use this:
E=prxchange('s/(<E T="\d+">(.+?)<\/E>)(.+?)/\2\3/',-1,A);
[After reading @PGStats's reply: he is right about replacement of course, and his expression is a bit simpler.]
This is what you are looking for ?
data _null_; A='<P><E T="12">Test </E>(a) <E T="03">Authority.</E> This part is issued pursuant to 12 U.S.C. 1 <E T="99">et seq.,</E> 12 U.S.C. 24 (Seventh), and 12 U.S.C. 93a.<E T="5">test</E></P>'; B=prxchange('s/<[^<>]+>/ /',-1,A); put A=/B=; run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.