Hello
How can I remove words from string if the word length is lower than 4?
For example:
For string "The copmany LLTX inc" I will get "company LLTX"
What have you tried so far?
Countw, loop over the string, Scan each word, Length function for each word, build new string with CATX function.
data test;
s = "The copmany LLTX inc";
n = prxchange("s/\b\w{1,3}\b//", -1, s);
run;
I am almost sure that this problem can be solved by using a regular expression and that is you will find code if you would search for it.
The answer supplied by @PeterClemmensen is almost there. Except that you may get leading blanks and double blanks in your output, they can be removed by LEFT and COMPBL:
data _null_;
s = "The copmany LLTX inc";
n = left(compbl(prxchange("s/\b\w{1,3}\b//", -1, s)));
put n $quote.;
run;
I changed the demonstration example to a DATA _NULL_, as this makes it easier to see the result in the log, with quotes so that you can easily see the leading blanks.
The PRX expression looks for a word boundary(\b), followed by one to three "word" characters (letter, numbers and underscores, that's \w) and then another word boundary, and that expression is changed to nothing.
If you also want to get rid of other sets of non-blank characters, such as "f.3" or "1,5", you can change the \w to \S, which means any non-blank character.
@s_lassen good catch.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.