Hi All,
Please help me with the below.
If I have the following value for a variable as below:
COMMENT |
Point 7 was omitted due to error |
Stone was 1mm, hence point 89 was populated |
15 point is available in page 2 |
74-66 point has 45 errors |
In the above variable COMMENT, I have 4 observations and would like to replace 'point 7', 'point 89', '15 point' and '74-66 point' with some other character without removing the other numbers like 1 in 1mm, 45 etc and my output should be as below.
COMMENT |
XXX was omitted due to error |
Stone was 1mm, hence XXX was populated |
XXX is available in page 2 |
XXX has 45 errors |
Can anyone help ?
A regular expression can do this (with prxchange):
data want;
set have;
length masked $ 50;
masked = prxchange('s/((\d+(-\d+)? )?point( \d+(-\d+)?)?)/XXX/i', 1, comment);
run;
A regular expression can do this (with prxchange):
data want;
set have;
length masked $ 50;
masked = prxchange('s/((\d+(-\d+)? )?point( \d+(-\d+)?)?)/XXX/i', 1, comment);
run;
@r3570 wrote:
Hi All,
Please help me with the below.
If I have the following value for a variable as below:
COMMENT Point 7 was omitted due to error Stone was 1mm, hence point 89 was populated 15 point is available in page 2 74-66 point has 45 errors
In the above variable COMMENT, I have 4 observations and would like to replace 'point 7', 'point 89', '15 point' and '74-66 point' with some other character without removing the other numbers like 1 in 1mm, 45 etc and my output should be as below.
COMMENT XXX was omitted due to error Stone was 1mm, hence XXX was populated XXX is available in page 2 XXX has 45 errors
Can anyone help ?
One of the basic tools for replacing specific sequences of characters is the TRANWRD function.
data example; length comment $ 50 ; Comment = "Point 7 was omitted due to error"; comment = tranwrd(comment,'Point 7','Replaced text here:'); run;
The text to replace is the second parameter and the last is the replacement.
Cautions:
The comparison is case sensitive. So if you look for "point 7" and the actual value is "Point 7" no match or replacement occurs.
If you replaced text in a variable that results in longer values that the current defined length of the variable the variable will be truncated.
The example provided explicitly allows for a longer value than the bare text in the first assignment of value to comment. If you comment out the Length statement making the variable longer you can see the result.
And the last is that the function will replace ALL occurrences of the search string with the replacement.
You can only provide one search string and one replacement string per Tranwrd function call. If you want to search for multiple different strings you either have one Tranwrd call per search / replacement statement OR get into some more interesting coding such as providing the values in a temporary arrays and searching over the contents of the array.
An example with one longer comment to demonstrate one approach:
data example; length comment $ 250 ; /* search for values*/ array s (4) $ 10 _temporary_('Point 7','point 89','15 point','74-66 point'); /* replacement values*/ array r (4) $ 10 _temporary_('R for 7','r for 89','r for 15','r for 74-66'); Comment = "Point 7 was omitted due to error Stone was 1mm, hence point 89 was populated 15 point is available in page 2 74-66 point has 45 errors"; do i = 1 to dim(s); comment = tranwrd(comment,strip(s[i]),strip(r[i])); end; drop i; run;
This is to show how to have a list of different replacements if desired. If you don't need that remove the R array definition and use 'XXX' for the replacement string. The STRIP function is used because by default SAS pads the string value to the defined length (10) with blanks when the array elements are used. The extra blanks mean that the search with the extra blanks would not match. If it did and the R array value is used you may find extra blanks in the middle of the result.
Arrays are a basic tool for using some process with multiple values. By default an array would create additional variables in your set but the keyword _temporary_ means the array variables are not written to the data set.
Thank you so much for your detailed explanation.
But i have some around 2000 observations with same pattern. So in that case I cannot provide all the patterns in the program and hence was looking for a solution which can replace such patterns with some other character in one go.
I do accept tranwrd but it is tedious task for me to apply for more than 1000 observations.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.