Hi guys,
I am really not good at using Perl Perl's regular expressions functions in data step... Below are several different expressions I wrote for the Prxchange function, and this is for searching for patterns and remove the <E T="XXX"><\E> tag but keep the content within this tag. I wonder what exactly is the difference between each expression and if any one can advise me on which one is the best (or even better -- propose a better one) that would be really awesome.
(XXX can be any integer)
data _null_;
A='<P><E T="12">Test </E>(a) <E T="03">Authority.</E> This part is issued pursuant to 12 U.S.C. 1 <E T="99">et seq.,</E> 12 U.S.C. 24 (Seventh), and 12 U.S.C. 93a.<E T="5">test</E></P>';
B=prxchange('s/(.+?)(<E.*?>(.+?)<\/E>)(.+?)/\1\3\4/',-1,A);
C=prxchange('s/(.+?)(<E T=.*?>(.+?)<\/E>)(.+?)/\1\3\4/',-1,A);
D=prxchange('s/(<E T=.*?>(.+?)<\/E>)(.+?)/\2\3/',-1,A);
put A=/B=C=D=;
run;
All of them generated desired results, but - none of them is completely satisfactory to me when I apply the function to the whole file( I have a large file to deal with, and this <E T="XXX"><\E> pattern can appear any where.
Thanks in advance!
Only specify the searched and replacement parts, the rest of the string will remain untouched
E=prxchange('s/<E T="\d+">(.*?)<\/E>/\1/', -1, A);
All these expressions do what you want and give the same result for your string.
The difference between B and C is that B capture any <E tag while C also needs T= inside.
D is the same as C except that C needs something before <E, so won't capture <E if it's at the start of the string.
> propose a better one
What is 'better"?
What's wrong with these?
If you want restrict the capture to <E T="{digits"}> you could use this:
E=prxchange('s/(<E T="\d+">(.+?)<\/E>)(.+?)/\2\3/',-1,A);
[After reading @PGStats's reply: he is right about replacement of course, and his expression is a bit simpler.]
This is what you are looking for ?
data _null_; A='<P><E T="12">Test </E>(a) <E T="03">Authority.</E> This part is issued pursuant to 12 U.S.C. 1 <E T="99">et seq.,</E> 12 U.S.C. 24 (Seventh), and 12 U.S.C. 93a.<E T="5">test</E></P>'; B=prxchange('s/<[^<>]+>/ /',-1,A); put A=/B=; run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: