BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kb011235
Obsidian | Level 7

Hi,

I'm trying to remove any key words ('MANUAL', 'MANUALENTRY', 'CEDEDMANUAL', 'MANUALCEDED') from a string (policy_number).

 

Here is the data:

data test_set ;
infile datalines truncover;
input policy_number $100. ;
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run; 

Here's the code:


data chk;
  set test_set;

	retain new_policy_number ;
	
	array nn[*] $ _all_ ('MANUALENTRY', 'MANUAL', 'CEDEDMANUAL', 'MANUALCEDED');

	do i=1 to dim(nn);

		new_policy_number = tranwrd(upcase(compress(tranwrd(policy_number,'_',''))), nn[i], "") ;

	end;

	drop i;

run;

 

Here are the results:

policy_number new_policy_number
Q2 2017 Manual Entry Q22017MANUALENTRY
00123400 00123400
Manual MANUAL
Manual_fix15 MANUALFIX15
Manual_fix22Q1 MANUALFIX22Q1
Manual All Data MANUALALLDATA
0098765401CededManual 0098765401CEDEDMANUAL
0098765401Manual 0098765401MANUAL
0011223301 CededManual 0011223301CEDEDMANUAL

 

None of the substrings were removed. Here's what they should be:

policy_number Should be
Q2 2017 Manual Entry Q22017
00123400 00123400
Manual  
Manual_fix15 FIX15
Manual_fix22Q1 FIX22Q1
Manual All Data ALLDATA
0098765401CededManual 0098765401
0098765401Manual 0098765401
0011223301 CededManual 0011223301

 

I thought that "new_policy_number" might be overwritten with each iteration, so I tried creating 'i' different (4) new_policy_number variables for each element of the array. That didn't work either. Please let me know what you think. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data test_set ;
infile datalines truncover; 
input policy_number $100. ; 
want=prxchange('s/MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,compress(policy_number,'_','s'));
datalines; 
Q2 2017 Manual Entry 
000123400 
Manual 
Manual_fix15 
Manual_fix22Q1 
Manual All Data 
0098765401CededManual 
0098765401Manual 
0011223301 CededManual 
; 
run; 

View solution in original post

8 REPLIES 8
andreas_lds
Jade | Level 19

Some corrections:

  1. The word "manual" is part of "manualentry" and "cedemanual", so if "manual" is removed first, "cede" and "entry" would stay in the string.
  2. Using "_all_" in the array statement is not what you want, as it includes "policy_number" also.
  3. Upcase and removing blanks and underscores was moved before the loop.
  4. Using "policy_number" in the loops restores already removed words.
  5. Changed from tranwrd to transtrn (see docs) and added trim to get rid of trailing blanks.
data want;
   set test_set;

   length new_policy_string $ 100;
   array words[4] $ 15 _temporary_ ('MANUALENTRY', 'CEDEDMANUAL', 'MANUALCEDED', 'MANUAL');
   
   new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
   
   do i = 1 to dim(words);
      new_policy_number = transtrn(new_policy_number, trim(words[i]), '');
   end;
   
   drop i;
run;
Tom
Super User Tom
Super User

@andreas_lds wrote:
...
... 
   new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
...

 

 Did you mean:

new_policy_number = compress(upcase(policy_number), '_ ');

In other words: remove the underscores and the spaces.

andreas_lds
Jade | Level 19

@Tom wrote:

@andreas_lds wrote:
...
... 
   new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
...

 

 Did you mean:

new_policy_number = compress(upcase(policy_number), '_ ');

In other words: remove the underscores and the spaces.


Well, yes ..😟

Oligolas
Barite | Level 11

Hi,

you need to stop the loop after a replacement is made and stop overwriting new_policy_number over and over again until you try to replace with 'MANUALCEDED' which always fails.

 

data test_set ;
infile datalines truncover; 
input policy_number $100. ; 
datalines; 
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
; 
run; 

data chk;
  set test_set;
   length new_policy_number $100;
	retain new_policy_number ;
	
	array nn[4] $50 _temporary_ ('MANUALENTRY' 'MANUAL' 'CEDEDMANUAL' 'MANUALCEDED');

	do i=1 to dim(nn);
      new_policy_number = upcase(compress(tranwrd(policy_number,'_','')));
      if index(new_policy_number,nn[i]) then do;
         new_policy_number = tranwrd(new_policy_number,nn[i], '') ;
         leave;
      end;
	end;

	drop i;
run;

Please also note that in obs #7 you're first replacing with manual since it comes first in your array and not 'cededmanual'

 

 

________________________

- Cheers -

Ksharp
Super User
data test_set ;
infile datalines truncover; 
input policy_number $100. ; 
want=prxchange('s/MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,compress(policy_number,'_','s'));
datalines; 
Q2 2017 Manual Entry 
000123400 
Manual 
Manual_fix15 
Manual_fix22Q1 
Manual All Data 
0098765401CededManual 
0098765401Manual 
0011223301 CededManual 
; 
run; 
kb011235
Obsidian | Level 7

Your solution is close. As Andreas_ids points out, "MANUAL" should be checked last. Wrap it in an upcase and that will work.

 

What's interesting is that case doesn't matter to the prxchange function; the key words are all caps and the string is not capitalized, yet the key words are removed from the string.

 

Follow-up question regarding prxchange; there are a couple of other operations performed on policy_number to remove leading zeros and any characters that are not alphanumeric. 

prxchange('s/^0+//o',-1,prxchange('s/[^A-Z 0-9]//i',-1,policy_number))

I tried combining the two prxchange operations into one but couldn't get it to work. Can the two prxchange statements be combined into one?

 

Thanks

kb011235
Obsidian | Level 7
Thanks everyone for your feedback. I appreciate it!
Ksharp
Super User

Once for all.

 

data test_set ;
infile datalines truncover; 
input policy_number $100. ; 
want=prxchange('s/^0+|[^a-z\d]|MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,strip(policy_number));
datalines; 
Q2 2017 Manual Entry 
000123400 
Manual 
Manual_fix15 
Manual_fix22Q1 
Manual All Data 
0098765401CededManual 
0098765401Manual 
0011223301 CededManual 
; 
run; 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2386 views
  • 0 likes
  • 5 in conversation