BookmarkSubscribeRSS Feed
ciphercong
Fluorite | Level 6

Hi guys,

 

I am really not good at using Perl Perl's regular expressions functions in data step... Below are several different expressions I wrote for the Prxchange function, and this is for searching for patterns and remove the <E T="XXX"><\E> tag but keep the content within this tag. I wonder what exactly is the difference between each expression and if any one can advise me on which one is the best (or even better -- propose a better one) that would be really awesome.

 

(XXX can be any integer)

 

data _null_;
A='<P><E T="12">Test </E>(a) <E T="03">Authority.</E> This part is issued pursuant to 12 U.S.C. 1 <E T="99">et seq.,</E> 12 U.S.C. 24 (Seventh), and 12 U.S.C. 93a.<E T="5">test</E></P>';
B=prxchange('s/(.+?)(<E.*?>(.+?)<\/E>)(.+?)/\1\3\4/',-1,A);
C=prxchange('s/(.+?)(<E T=.*?>(.+?)<\/E>)(.+?)/\1\3\4/',-1,A);
D=prxchange('s/(<E T=.*?>(.+?)<\/E>)(.+?)/\2\3/',-1,A);
put A=/B=C=D=;
run;

 

 

All of them generated desired results, but - none of them is completely satisfactory to me when I apply the function to the whole file( I have a large file to deal with, and this <E T="XXX"><\E> pattern can appear any where. 

 

Thanks in advance!

3 REPLIES 3
PGStats
Opal | Level 21

Only specify the searched and replacement parts, the rest of the string will remain untouched

 

E=prxchange('s/<E T="\d+">(.*?)<\/E>/\1/', -1, A);
PG
ChrisNZ
Tourmaline | Level 20

All these expressions do what you want and give the same result for your string.

The difference between B and C is that B capture any <E tag while C also needs T= inside.

D is the same as C except that C needs something before <E, so won't capture <E if it's at the start of the string.

 

> propose a better one

 

What is 'better"?

What's wrong with these?

 

If you want restrict the capture to <E T="{digits"}> you could use this:

E=prxchange('s/(<E T="\d+">(.+?)<\/E>)(.+?)/\2\3/',-1,A);

 

[After reading @PGStats's reply: he is right about replacement of course, and his expression is a bit simpler.]

Ksharp
Super User

This is what you are looking for ?

 

data _null_;
A='<P><E T="12">Test </E>(a) <E T="03">Authority.</E> This part is issued pursuant to 12 U.S.C. 1 <E T="99">et seq.,</E> 12 U.S.C. 24 (Seventh), and 12 U.S.C. 93a.<E T="5">test</E></P>';
B=prxchange('s/<[^<>]+>/ /',-1,A);
put A=/B=;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 633 views
  • 0 likes
  • 4 in conversation