prxchange with macro

GingerJJ · Posted 08-20-2021 02:21 PM

ARRAY NAMES[*] $ 40 LN FN ;
DO I = 1 to 2;
NAMES [I]= ORIGINAL [I];

END;
DO I = 1 to 2;
/* turn anything that is NOT a letter into space*/
NAMES[I] = prxchange("s/[^A-Z]/ /",-1,NAMES[I]);
NAMES[I] = prxchange("s/"&BAD_PART"/ /",-1, left(trim(NAMES[I])));
NAMES[I] = STRIP(NAMES[I]);/*removing all front and back space*/
NAMES[I] = COMPBL(NAMES[I]); /*Change multiple blanks into one*/;

END;
DROP I;
RUN;

It won't run. Can someone help me?

Thank you!

@maguiremq

maguiremq · Posted 08-20-2021 02:30 PM

Hey @GingerJJ, good to see you again.

What does your test1 dataset look like? I would need to see what's going on there before I or anyone else can help.

If you can, please provide it in a DATALINES statement so that we can reproduce it on our computers. We usually don't like downloading things due to potential risks.

Edit: What happens when you remove the double quotes around &bad_part.?

Thanks,

Michael

GingerJJ · Posted 08-20-2021 03:13 PM

Thanks for getting back to me so quick.

It got rid of all the error messages after I remove the "" around &BAD_PART. But, it didn't get rid of the bad parts at all.

Dataline;

C_LN C_FN

DUPLICATE-DO NOT USE John234
DO NOT USE (DUPLICATE) DUPLICATE RECORD
DO NOT USE DUPLICATE FILE [DONOT USE]
DUPLICATE FILE DUPLICATE FILE 'DONT USE'
(DUPLICATE) "DUPLICATE"

maguiremq · Posted 08-20-2021 03:30 PM

Okay, and what do you want it to look like?

Also, here's how you'd post it to us to make things easier:

data test1;
infile datalines delimiter = "," truncover;
input c_ln :$50. c_fn :$50.;
datalines;
DUPLICATE-DO NOT USE, John234
DO NOT USE (DUPLICATE), DUPLICATE RECORD
DO NOT USE DUPLICATE FILE, [DONOT USE]
DUPLICATE FILE DUPLICATE FILE, 'DONT USE'
(DUPLICATE),"DUPLICATE"
;

GingerJJ · Posted 08-20-2021 03:32 PM

I want turn everything in the BAD_PART into a space.

But first I want to turn all the special characters and numbers into space.

maguiremq · Posted 08-20-2021 03:48 PM

Here's part 1:

%let bad_part2 = %sysfunc(tranwrd(&bad_part., %str(|), %str( )));

 %put &bad_part2.;
DUPLICATE DO NOT USE DONOT USE FILE RECORD DO NOT ISSUE DONT USE

data want;
	set test1;
		c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));
		c_fn_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_fn));
run;

proc print data = want;
run;

Obs	c_ln	c_fn	c_ln_1	c_fn_1
1	DUPLICATE-DO NOT USE	John234	DUPLICATE DO NOT USE	John
2	DO NOT USE (DUPLICATE)	DUPLICATE RECORD	DO NOT USE DUPLICATE	DUPLICATE RECORD
3	DO NOT USE DUPLICATE FILE	[DONOT USE]	DO NOT USE DUPLICATE FILE	DONOT USE
4	DUPLICATE FILE DUPLICATE FILE	'DONT USE'	DUPLICATE FILE DUPLICATE FILE	DONT USE
5	(DUPLICATE)	"DUPLICATE"	DUPLICATE	DUPLICATE

GingerJJ · Posted 08-20-2021 03:51 PM

Got it! Will do in the future!

GingerJJ · Posted 08-20-2021 03:55 PM

Sorry I didn't make myself clear.

I want to get rid of all the phrases in BAD_PART and turn that into a single space.

eg. turn "DUPLICATE John" into " John"

maguiremq · Posted 08-20-2021 04:02 PM

So is John the only thing left?

data want;
	set test1;
		c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));
		c_fn_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_fn));
		c_ln_2 = prxchange("s/(&bad_part.)/ /", -1, trim(c_ln_1));
		c_fn_2 = prxchange("s/(&bad_part.)/ /", -1, trim(c_fn_1));
run;

GingerJJ · Posted 08-20-2021 04:06 PM

It fixed it!

Thank you so much!

c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));

What is the "+" for in the above statement?

maguiremq · Posted 08-20-2021 04:14 PM

The plus is considered a 'greedy match'. It takes however many combinations of whatever you're requesting. If it finds one character, it will keep that one character. If there are two, it will keep those two. If there are 90, it will keep 90.

Tom · Posted 08-20-2021 04:25 PM

@maguiremq wrote:

The plus is considered a 'greedy match'. It takes however many combinations of whatever you're requesting. If it finds one character, it will keep that one character. If there are two, it will keep those two. If there are 90, it will keep 90.

That is more complicated explanation than I would have expected.

+ means match 1 or more.

* means match 0 or more.

Since you mentioned greedy match what would be the way to do a non-greedy match? I don't think there is one for this example but perhaps you could provide a better example that demonstrates a non-greedy match and explain the RegEx symbols needed for that.

maguiremq · Posted 08-22-2021 08:29 AM

No, @Tom, you're completely right. I don't know what I was thinking that day. Must have been the end-of-workweek blues.

GingerJJ · Posted 08-20-2021 04:33 PM

NAMES[I] = prxchange("s/[^A-Z]/ /",-1,trim(NAMES[I]));

The above code also serves the purpose without the "+". So when is it absolutely necessary to add the "+"?

Tom · Posted 08-20-2021 04:35 PM

Try it and see if there is a difference.

Look carefully at how many spaces are used to replace the non uppercase letters.

Spoiler

 69         data test;
 70          x='AbcdFGHxyz';
 71          x2 = prxchange("s/[^A-Z]/ /",-1,x);
 72          x3 = prxchange("s/[^A-Z]+/ /",-1,x);
 73          put (_all_) (=:$quote./);
 74         run;
 
 x="AbcdFGHxyz"
 x2="A   FGH"
 x3="A FGH"

69 data test; 70 x='AbcdFGHxyz'; 71 x2 = prxchange("s/[^A-Z]/ /",-1,x); 72 x3 = prxchange("s/[^A-Z]+/ /",-1,x); 73 put (_all_) (=:$quote./); 74 run; x="AbcdFGHxyz" x2="A FGH" x3="A FGH"

prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

Re: prxchange with macro

SAS Innovate 2025: Call for Content

Classroom Training Available!