Did you know that: the TRANWRD function, which can replace or remove all occurrences of a given word, has a hidden gotcha?
The gotcha is that, if you do not provide a length statement for a new variable that will receive the output from the TRANWRD function, it will default to a length of 200 characters. The inimitable Ron Cody pointed this out in his SUGI 31 paper An Introduction to SAS Character Functions.
Here is an example of the gotcha:
data _null_;
length newstatement1 $34;
statement = "I enjoy going to SUGI conferences.";
newstatement1 = tranwrd(statement,"SUGI", "SGF");
newstatement2 = tranwrd(statement,"SUGI", "SGF");
length_newstatement1 = lengthc(newstatement1);
length_newstatement2 = lengthc(newstatement2);
put length_newstatement1 = ;
put length_newstatement2 = ;
run;
If you submit the code, above, you will see that newstatement1 ends up with a length of 34, while newstatement2 ends up with a length of 200. That is because newstatement1 had a length statement specifying its size before it was used with the TRANWRD function; while newstatement2 was created fresh and new when used in the TRANWRD function and SAS--not knowing what to make of this infant variable--decided to make it 200 characters in length; the TRANWRD default.
So, what is the gotcha? The gotcha is that you could end up with larger variables than you intend to--needlessly inflating the size of a SAS data set--if you are not careful to declare the length of a new SAS character variable that will be created by using the TRANWRD function. You have been warned!
Other functions, such as SUBSTR, will also generate long character variables.
A secondary issue is that when the returned value exceeds 200 characters, truncation will occur if you have not properly set the length of the new variable.
Thanks to MMMMIIIIKKKKEEEE for sharing this tip on sasCommunity.org.