BookmarkSubscribeRSS Feed
ciphercong
Fluorite | Level 6

Hi guys,

 

I am really not good at using Perl Perl's regular expressions functions in data step... Below are several different expressions I wrote for the Prxchange function, and this is for searching for patterns and remove the <E T="XXX"><\E> tag but keep the content within this tag. I wonder what exactly is the difference between each expression and if any one can advise me on which one is the best (or even better -- propose a better one) that would be really awesome.

 

(XXX can be any integer)

 

data _null_;
A='<P><E T="12">Test </E>(a) <E T="03">Authority.</E> This part is issued pursuant to 12 U.S.C. 1 <E T="99">et seq.,</E> 12 U.S.C. 24 (Seventh), and 12 U.S.C. 93a.<E T="5">test</E></P>';
B=prxchange('s/(.+?)(<E.*?>(.+?)<\/E>)(.+?)/\1\3\4/',-1,A);
C=prxchange('s/(.+?)(<E T=.*?>(.+?)<\/E>)(.+?)/\1\3\4/',-1,A);
D=prxchange('s/(<E T=.*?>(.+?)<\/E>)(.+?)/\2\3/',-1,A);
put A=/B=C=D=;
run;

 

 

All of them generated desired results, but - none of them is completely satisfactory to me when I apply the function to the whole file( I have a large file to deal with, and this <E T="XXX"><\E> pattern can appear any where. 

 

Thanks in advance!

3 REPLIES 3
PGStats
Opal | Level 21

Only specify the searched and replacement parts, the rest of the string will remain untouched

 

E=prxchange('s/<E T="\d+">(.*?)<\/E>/\1/', -1, A);
PG
ChrisNZ
Tourmaline | Level 20

All these expressions do what you want and give the same result for your string.

The difference between B and C is that B capture any <E tag while C also needs T= inside.

D is the same as C except that C needs something before <E, so won't capture <E if it's at the start of the string.

 

> propose a better one

 

What is 'better"?

What's wrong with these?

 

If you want restrict the capture to <E T="{digits"}> you could use this:

E=prxchange('s/(<E T="\d+">(.+?)<\/E>)(.+?)/\2\3/',-1,A);

 

[After reading @PGStats's reply: he is right about replacement of course, and his expression is a bit simpler.]

Ksharp
Super User

This is what you are looking for ?

 

data _null_;
A='<P><E T="12">Test </E>(a) <E T="03">Authority.</E> This part is issued pursuant to 12 U.S.C. 1 <E T="99">et seq.,</E> 12 U.S.C. 24 (Seventh), and 12 U.S.C. 93a.<E T="5">test</E></P>';
B=prxchange('s/<[^<>]+>/ /',-1,A);
put A=/B=;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1169 views
  • 0 likes
  • 4 in conversation