%LET BAD_PART=DUPLICATE|DO NOT USE|DONOT USE|FILE|RECORD|DO NOT ISSUE|DONT USE;
DATA test1a;
SET test1;
ARRAY ORIGINAL [*] $ 40 C_LN C_FN;
ARRAY NAMES[*] $ 40 LN FN ;
DO I = 1 to 2;
NAMES [I]= ORIGINAL [I];
END;
DO I = 1 to 2;
/* turn anything that is NOT a letter into space*/
NAMES[I] = prxchange("s/[^A-Z]/ /",-1,NAMES[I]);
NAMES[I] = prxchange("s/"&BAD_PART"/ /",-1, left(trim(NAMES[I])));
NAMES[I] = STRIP(NAMES[I]);/*removing all front and back space*/
NAMES[I] = COMPBL(NAMES[I]); /*Change multiple blanks into one*/;
END;
DROP I;
RUN;
It won't run. Can someone help me?
Thank you!
Hey @GingerJJ, good to see you again.
What does your test1 dataset look like? I would need to see what's going on there before I or anyone else can help.
If you can, please provide it in a DATALINES statement so that we can reproduce it on our computers. We usually don't like downloading things due to potential risks.
Edit: What happens when you remove the double quotes around &bad_part.?
Thanks,
Michael
Thanks for getting back to me so quick.
It got rid of all the error messages after I remove the "" around &BAD_PART. But, it didn't get rid of the bad parts at all.
Dataline;
C_LN C_FN
DUPLICATE-DO NOT USE John234
DO NOT USE (DUPLICATE) DUPLICATE RECORD
DO NOT USE DUPLICATE FILE [DONOT USE]
DUPLICATE FILE DUPLICATE FILE 'DONT USE'
(DUPLICATE) "DUPLICATE"
Okay, and what do you want it to look like?
Also, here's how you'd post it to us to make things easier:
data test1;
infile datalines delimiter = "," truncover;
input c_ln :$50. c_fn :$50.;
datalines;
DUPLICATE-DO NOT USE, John234
DO NOT USE (DUPLICATE), DUPLICATE RECORD
DO NOT USE DUPLICATE FILE, [DONOT USE]
DUPLICATE FILE DUPLICATE FILE, 'DONT USE'
(DUPLICATE),"DUPLICATE"
;
I want turn everything in the BAD_PART into a space.
But first I want to turn all the special characters and numbers into space.
Here's part 1:
%let bad_part2 = %sysfunc(tranwrd(&bad_part., %str(|), %str( )));
%put &bad_part2.; DUPLICATE DO NOT USE DONOT USE FILE RECORD DO NOT ISSUE DONT USE
data want;
set test1;
c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));
c_fn_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_fn));
run;
proc print data = want;
run;
Obs | c_ln | c_fn | c_ln_1 | c_fn_1 |
---|---|---|---|---|
1 | DUPLICATE-DO NOT USE | John234 | DUPLICATE DO NOT USE | John |
2 | DO NOT USE (DUPLICATE) | DUPLICATE RECORD | DO NOT USE DUPLICATE | DUPLICATE RECORD |
3 | DO NOT USE DUPLICATE FILE | [DONOT USE] | DO NOT USE DUPLICATE FILE | DONOT USE |
4 | DUPLICATE FILE DUPLICATE FILE | 'DONT USE' | DUPLICATE FILE DUPLICATE FILE | DONT USE |
5 | (DUPLICATE) | "DUPLICATE" | DUPLICATE | DUPLICATE |
Got it! Will do in the future!
Sorry I didn't make myself clear.
%LET BAD_PART=DUPLICATE|DO NOT USE|DONOT USE|FILE|RECORD|DO NOT ISSUE|DONT USE;
I want to get rid of all the phrases in BAD_PART and turn that into a single space.
eg. turn "DUPLICATE John" into " John"
So is John the only thing left?
data want;
set test1;
c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));
c_fn_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_fn));
c_ln_2 = prxchange("s/(&bad_part.)/ /", -1, trim(c_ln_1));
c_fn_2 = prxchange("s/(&bad_part.)/ /", -1, trim(c_fn_1));
run;
It fixed it!
Thank you so much!
c_ln_1 = prxchange("s/[^A-Za-z]+/ /", -1, trim(c_ln));
What is the "+" for in the above statement?
The plus is considered a 'greedy match'. It takes however many combinations of whatever you're requesting. If it finds one character, it will keep that one character. If there are two, it will keep those two. If there are 90, it will keep 90.
@maguiremq wrote:
The plus is considered a 'greedy match'. It takes however many combinations of whatever you're requesting. If it finds one character, it will keep that one character. If there are two, it will keep those two. If there are 90, it will keep 90.
That is more complicated explanation than I would have expected.
+ means match 1 or more.
* means match 0 or more.
Since you mentioned greedy match what would be the way to do a non-greedy match? I don't think there is one for this example but perhaps you could provide a better example that demonstrates a non-greedy match and explain the RegEx symbols needed for that.
NAMES[I] = prxchange("s/[^A-Z]/ /",-1,trim(NAMES[I]));
The above code also serves the purpose without the "+". So when is it absolutely necessary to add the "+"?
Try it and see if there is a difference.
Look carefully at how many spaces are used to replace the non uppercase letters.
69 data test; 70 x='AbcdFGHxyz'; 71 x2 = prxchange("s/[^A-Z]/ /",-1,x); 72 x3 = prxchange("s/[^A-Z]+/ /",-1,x); 73 put (_all_) (=:$quote./); 74 run; x="AbcdFGHxyz" x2="A FGH" x3="A FGH"
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.