BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kb011235
Obsidian | Level 7

Hi,

I'm trying to remove any key words ('MANUAL', 'MANUALENTRY', 'CEDEDMANUAL', 'MANUALCEDED') from a string (policy_number).

 

Here is the data:

data test_set ;
infile datalines truncover;
input policy_number $100. ;
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run; 

Here's the code:


data chk;
  set test_set;

	retain new_policy_number ;
	
	array nn[*] $ _all_ ('MANUALENTRY', 'MANUAL', 'CEDEDMANUAL', 'MANUALCEDED');

	do i=1 to dim(nn);

		new_policy_number = tranwrd(upcase(compress(tranwrd(policy_number,'_',''))), nn[i], "") ;

	end;

	drop i;

run;

 

Here are the results:

policy_number new_policy_number
Q2 2017 Manual Entry Q22017MANUALENTRY
00123400 00123400
Manual MANUAL
Manual_fix15 MANUALFIX15
Manual_fix22Q1 MANUALFIX22Q1
Manual All Data MANUALALLDATA
0098765401CededManual 0098765401CEDEDMANUAL
0098765401Manual 0098765401MANUAL
0011223301 CededManual 0011223301CEDEDMANUAL

 

None of the substrings were removed. Here's what they should be:

policy_number Should be
Q2 2017 Manual Entry Q22017
00123400 00123400
Manual  
Manual_fix15 FIX15
Manual_fix22Q1 FIX22Q1
Manual All Data ALLDATA
0098765401CededManual 0098765401
0098765401Manual 0098765401
0011223301 CededManual 0011223301

 

I thought that "new_policy_number" might be overwritten with each iteration, so I tried creating 'i' different (4) new_policy_number variables for each element of the array. That didn't work either. Please let me know what you think. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data test_set ;
infile datalines truncover; 
input policy_number $100. ; 
want=prxchange('s/MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,compress(policy_number,'_','s'));
datalines; 
Q2 2017 Manual Entry 
000123400 
Manual 
Manual_fix15 
Manual_fix22Q1 
Manual All Data 
0098765401CededManual 
0098765401Manual 
0011223301 CededManual 
; 
run; 

View solution in original post

8 REPLIES 8
andreas_lds
Jade | Level 19

Some corrections:

  1. The word "manual" is part of "manualentry" and "cedemanual", so if "manual" is removed first, "cede" and "entry" would stay in the string.
  2. Using "_all_" in the array statement is not what you want, as it includes "policy_number" also.
  3. Upcase and removing blanks and underscores was moved before the loop.
  4. Using "policy_number" in the loops restores already removed words.
  5. Changed from tranwrd to transtrn (see docs) and added trim to get rid of trailing blanks.
data want;
   set test_set;

   length new_policy_string $ 100;
   array words[4] $ 15 _temporary_ ('MANUALENTRY', 'CEDEDMANUAL', 'MANUALCEDED', 'MANUAL');
   
   new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
   
   do i = 1 to dim(words);
      new_policy_number = transtrn(new_policy_number, trim(words[i]), '');
   end;
   
   drop i;
run;
Tom
Super User Tom
Super User

@andreas_lds wrote:
...
... 
   new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
...

 

 Did you mean:

new_policy_number = compress(upcase(policy_number), '_ ');

In other words: remove the underscores and the spaces.

andreas_lds
Jade | Level 19

@Tom wrote:

@andreas_lds wrote:
...
... 
   new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
...

 

 Did you mean:

new_policy_number = compress(upcase(policy_number), '_ ');

In other words: remove the underscores and the spaces.


Well, yes ..😟

Oligolas
Barite | Level 11

Hi,

you need to stop the loop after a replacement is made and stop overwriting new_policy_number over and over again until you try to replace with 'MANUALCEDED' which always fails.

 

data test_set ;
infile datalines truncover; 
input policy_number $100. ; 
datalines; 
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
; 
run; 

data chk;
  set test_set;
   length new_policy_number $100;
	retain new_policy_number ;
	
	array nn[4] $50 _temporary_ ('MANUALENTRY' 'MANUAL' 'CEDEDMANUAL' 'MANUALCEDED');

	do i=1 to dim(nn);
      new_policy_number = upcase(compress(tranwrd(policy_number,'_','')));
      if index(new_policy_number,nn[i]) then do;
         new_policy_number = tranwrd(new_policy_number,nn[i], '') ;
         leave;
      end;
	end;

	drop i;
run;

Please also note that in obs #7 you're first replacing with manual since it comes first in your array and not 'cededmanual'

 

 

________________________

- Cheers -

Ksharp
Super User
data test_set ;
infile datalines truncover; 
input policy_number $100. ; 
want=prxchange('s/MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,compress(policy_number,'_','s'));
datalines; 
Q2 2017 Manual Entry 
000123400 
Manual 
Manual_fix15 
Manual_fix22Q1 
Manual All Data 
0098765401CededManual 
0098765401Manual 
0011223301 CededManual 
; 
run; 
kb011235
Obsidian | Level 7

Your solution is close. As Andreas_ids points out, "MANUAL" should be checked last. Wrap it in an upcase and that will work.

 

What's interesting is that case doesn't matter to the prxchange function; the key words are all caps and the string is not capitalized, yet the key words are removed from the string.

 

Follow-up question regarding prxchange; there are a couple of other operations performed on policy_number to remove leading zeros and any characters that are not alphanumeric. 

prxchange('s/^0+//o',-1,prxchange('s/[^A-Z 0-9]//i',-1,policy_number))

I tried combining the two prxchange operations into one but couldn't get it to work. Can the two prxchange statements be combined into one?

 

Thanks

kb011235
Obsidian | Level 7
Thanks everyone for your feedback. I appreciate it!
Ksharp
Super User

Once for all.

 

data test_set ;
infile datalines truncover; 
input policy_number $100. ; 
want=prxchange('s/^0+|[^a-z\d]|MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,strip(policy_number));
datalines; 
Q2 2017 Manual Entry 
000123400 
Manual 
Manual_fix15 
Manual_fix22Q1 
Manual All Data 
0098765401CededManual 
0098765401Manual 
0011223301 CededManual 
; 
run; 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2235 views
  • 0 likes
  • 5 in conversation