BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
r3570
Obsidian | Level 7

Hi All,

 

Please help me with the below.

 

If I have the following value for a variable as below:

 

COMMENT
Point 7 was omitted due to error
Stone was 1mm, hence point 89 was populated
15 point is available in page 2
74-66 point has 45 errors

 

In the above variable COMMENT, I have 4 observations and would like to replace 'point 7', 'point 89', '15 point' and '74-66 point' with some other character without removing the other numbers like 1 in 1mm, 45 etc and my output should be as below.

 

COMMENT
XXX was omitted due to error
Stone was 1mm, hence XXX was populated
XXX is available in page 2
XXX has 45 errors

 

Can anyone help ?

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

A regular expression can do this (with prxchange):

data want;
   set have;
   length masked $ 50;
   
   masked = prxchange('s/((\d+(-\d+)? )?point( \d+(-\d+)?)?)/XXX/i', 1, comment);
run;

View solution in original post

3 REPLIES 3
andreas_lds
Jade | Level 19

A regular expression can do this (with prxchange):

data want;
   set have;
   length masked $ 50;
   
   masked = prxchange('s/((\d+(-\d+)? )?point( \d+(-\d+)?)?)/XXX/i', 1, comment);
run;
ballardw
Super User

@r3570 wrote:

Hi All,

 

Please help me with the below.

 

If I have the following value for a variable as below:

 

COMMENT
Point 7 was omitted due to error
Stone was 1mm, hence point 89 was populated
15 point is available in page 2
74-66 point has 45 errors

 

In the above variable COMMENT, I have 4 observations and would like to replace 'point 7', 'point 89', '15 point' and '74-66 point' with some other character without removing the other numbers like 1 in 1mm, 45 etc and my output should be as below.

 

COMMENT
XXX was omitted due to error
Stone was 1mm, hence XXX was populated
XXX is available in page 2
XXX has 45 errors

 

Can anyone help ?


One of the basic tools for replacing specific sequences of characters is the TRANWRD function.

data example;
   length comment $ 50 ;
   Comment = "Point 7 was omitted due to error";
   comment = tranwrd(comment,'Point 7','Replaced text here:');
run;

The text to replace is the second parameter and the last is the replacement.

 

Cautions:

The comparison is case sensitive. So if you look for "point 7" and the actual value is "Point 7" no match or replacement occurs.

If you replaced text in a variable that results in longer values that the current defined length of the variable the variable will be truncated.

The example provided explicitly allows for a longer value than the bare text in the first assignment of value to comment. If you comment out the Length statement making the variable longer you can see the result.

And the last is that the function will replace ALL occurrences of the search string with the replacement.

You can only provide one search string and one replacement string per Tranwrd function call. If you want to search for multiple different strings you either have one Tranwrd call per search / replacement statement OR get into some more interesting coding such as providing the values in a temporary arrays and searching over the contents of the array.

An example with one longer comment to demonstrate one approach:

data example;
   length comment $ 250 ;
   /* search for values*/
   array s (4) $ 10 _temporary_('Point 7','point 89','15 point','74-66 point');
   /* replacement values*/
   array r (4) $ 10 _temporary_('R for 7','r for 89','r for 15','r for 74-66');
   Comment = "Point 7 was omitted due to error 
 Stone was 1mm, hence point 89 was populated 
 15 point is available in page 2 
 74-66 point has 45 errors";
   do i = 1 to dim(s);
      comment = tranwrd(comment,strip(s[i]),strip(r[i]));
   end;
   drop i;
run;

This is to show how to have a list of different replacements if desired. If you don't need that remove the R array definition and use 'XXX' for the replacement string. The STRIP function is used because by default SAS pads the string value to the defined length (10) with blanks when the array elements are used. The extra blanks mean that the search with the extra blanks would not match. If it did and the R array value is used you may find extra blanks in the middle of the result.

 

Arrays are a basic tool for using some process with multiple values. By default an array would create additional variables in your set but the keyword _temporary_ means the array variables are not written to the data set.

r3570
Obsidian | Level 7

Thank you so much for your detailed explanation.

 

But i have some around 2000 observations with same pattern. So in that case I cannot provide all the patterns in the program and hence was looking for a solution which can replace such patterns with some other character in one go.

 

I do accept tranwrd but it is tedious task for me to apply for more than 1000 observations.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 939 views
  • 2 likes
  • 3 in conversation