BookmarkSubscribeRSS Feed
GingerJJ
Obsidian | Level 7

%LET BAD_PART=DUPLICATE|DO NOT USE|DONOT USE|FILE|RECORD|DO NOT ISSUE|DONT USE;

DATA test1a;
SET test1;
ARRAY ORIGINAL [*] $ 40 C_LN C_FN;

ARRAY NAMES[*] $ 40 LN FN ;
DO I = 1 to 2;
NAMES [I]= ORIGINAL [I];

END;
DO I = 1 to 2;
/* turn anything that is NOT a letter into space*/
NAMES[I] = prxchange("s/[^A-Z]/ /",-1,NAMES[I]);
NAMES[I] = prxchange("s/"&BAD_PART"/ /",-1, left(trim(NAMES[I])));
NAMES[I] = STRIP(NAMES[I]);/*removing all front and back space*/
NAMES[I] = COMPBL(NAMES[I]); /*Change multiple blanks into one*/;

END;
DROP I;
RUN;

 

It won't run. Can someone help me?

Thank you! 

@maguiremq

25 REPLIES 25
maguiremq
SAS Super FREQ

Hey @GingerJJ, good to see you again.

 

What does your test1 dataset look like? I would need to see what's going on there before I or anyone else can help.

 

If you can, please provide it in a DATALINES statement so that we can reproduce it on our computers. We usually don't like downloading things due to potential risks.

 

Edit: What happens when you remove the double quotes around &bad_part.?

Thanks,

Michael

GingerJJ
Obsidian | Level 7

Thanks for getting back to me so quick.

It got rid of all the error messages after I remove the "" around &BAD_PART. But, it didn't get rid of the bad parts at all.

 

Dataline;

C_LN                                                                 C_FN

DUPLICATE-DO NOT USE                              John234
DO NOT USE (DUPLICATE)                     DUPLICATE RECORD
DO NOT USE DUPLICATE FILE                [DONOT USE]
DUPLICATE FILE DUPLICATE FILE          'DONT USE'
 (DUPLICATE)                                            "DUPLICATE"

maguiremq
SAS Super FREQ

Okay, and what do you want it to look like?

 

Also, here's how you'd post it to us to make things easier:

data test1;
infile datalines delimiter = "," truncover;
input c_ln :$50. c_fn :$50.;
datalines;
DUPLICATE-DO NOT USE, John234
DO NOT USE (DUPLICATE), DUPLICATE RECORD
DO NOT USE DUPLICATE FILE, [DONOT USE]
DUPLICATE FILE DUPLICATE FILE, 'DONT USE'
(DUPLICATE),"DUPLICATE"
;
GingerJJ
Obsidian | Level 7

I want turn everything in the BAD_PART into a space.

But first I want to turn all the special characters and numbers into space. 

maguiremq
SAS Super FREQ

Here's part 1:

 

%let bad_part2 = %sysfunc(tranwrd(&bad_part., %str(|), %str( )));
 %put &bad_part2.;
DUPLICATE DO NOT USE DONOT USE FILE RECORD DO NOT ISSUE DONT USE

 

 

 

data want;
	set test1;
		c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));
		c_fn_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_fn));
run;

proc print data = want;
run;
Obs c_ln c_fn c_ln_1 c_fn_1
1 DUPLICATE-DO NOT USE John234 DUPLICATE DO NOT USE John
2 DO NOT USE (DUPLICATE) DUPLICATE RECORD DO NOT USE DUPLICATE DUPLICATE RECORD
3 DO NOT USE DUPLICATE FILE [DONOT USE] DO NOT USE DUPLICATE FILE DONOT USE
4 DUPLICATE FILE DUPLICATE FILE 'DONT USE' DUPLICATE FILE DUPLICATE FILE DONT USE
5 (DUPLICATE) "DUPLICATE" DUPLICATE DUPLICATE

 

GingerJJ
Obsidian | Level 7

Got it! Will do in the future!

GingerJJ
Obsidian | Level 7

Sorry I didn't make myself clear. 

%LET BAD_PART=DUPLICATE|DO NOT USE|DONOT USE|FILE|RECORD|DO NOT ISSUE|DONT USE;

I want to get rid of all the phrases in BAD_PART and turn that into a single space.

eg. turn "DUPLICATE John" into "  John"

maguiremq
SAS Super FREQ

So is John the only thing left?

data want;
	set test1;
		c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));
		c_fn_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_fn));
		c_ln_2 = prxchange("s/(&bad_part.)/ /", -1, trim(c_ln_1));
		c_fn_2 = prxchange("s/(&bad_part.)/ /", -1, trim(c_fn_1));
run;
GingerJJ
Obsidian | Level 7

It fixed it!

Thank you so much!

c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));

What is the "+" for in the above statement?

maguiremq
SAS Super FREQ

The plus is considered a 'greedy match'. It takes however many combinations of whatever you're requesting. If it finds one character, it will keep that one character. If there are two, it will keep those two. If there are 90, it will keep 90.

Tom
Super User Tom
Super User

@maguiremq wrote:

The plus is considered a 'greedy match'. It takes however many combinations of whatever you're requesting. If it finds one character, it will keep that one character. If there are two, it will keep those two. If there are 90, it will keep 90.


That is more complicated explanation than I would have expected. 

+ means match 1 or more. 

* means match 0 or more.

 

Since you mentioned greedy match what would be the way to do a non-greedy match?  I don't think there is one for this example but perhaps you could provide a better example that demonstrates a non-greedy match and explain the RegEx symbols needed for that.

maguiremq
SAS Super FREQ
No, @Tom, you're completely right. I don't know what I was thinking that day. Must have been the end-of-workweek blues.
GingerJJ
Obsidian | Level 7

NAMES[I] = prxchange("s/[^A-Z]/ /",-1,trim(NAMES[I]));

The above code also serves the purpose without the "+". So when is it absolutely necessary to add the "+"?

Tom
Super User Tom
Super User

Try it and see if there is a difference.

Look carefully at how many spaces are used to replace the non uppercase letters.

 

Spoiler

 

 69         data test;
 70          x='AbcdFGHxyz';
 71          x2 = prxchange("s/[^A-Z]/ /",-1,x);
 72          x3 = prxchange("s/[^A-Z]+/ /",-1,x);
 73          put (_all_) (=:$quote./);
 74         run;
 
 x="AbcdFGHxyz"
 x2="A   FGH"
 x3="A FGH"

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 25 replies
  • 3779 views
  • 8 likes
  • 4 in conversation