Hi,
I'm trying to remove any key words ('MANUAL', 'MANUALENTRY', 'CEDEDMANUAL', 'MANUALCEDED') from a string (policy_number).
Here is the data:
data test_set ;infile datalines truncover;
input policy_number $100. ;
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
Here's the code:
data chk;
set test_set;
retain new_policy_number ;
array nn[*] $ _all_ ('MANUALENTRY', 'MANUAL', 'CEDEDMANUAL', 'MANUALCEDED');
do i=1 to dim(nn);
new_policy_number = tranwrd(upcase(compress(tranwrd(policy_number,'_',''))), nn[i], "") ;
end;
drop i;
run;
Here are the results:
| policy_number | new_policy_number |
| Q2 2017 Manual Entry | Q22017MANUALENTRY |
| 00123400 | 00123400 |
| Manual | MANUAL |
| Manual_fix15 | MANUALFIX15 |
| Manual_fix22Q1 | MANUALFIX22Q1 |
| Manual All Data | MANUALALLDATA |
| 0098765401CededManual | 0098765401CEDEDMANUAL |
| 0098765401Manual | 0098765401MANUAL |
| 0011223301 CededManual | 0011223301CEDEDMANUAL |
None of the substrings were removed. Here's what they should be:
| policy_number | Should be |
| Q2 2017 Manual Entry | Q22017 |
| 00123400 | 00123400 |
| Manual | |
| Manual_fix15 | FIX15 |
| Manual_fix22Q1 | FIX22Q1 |
| Manual All Data | ALLDATA |
| 0098765401CededManual | 0098765401 |
| 0098765401Manual | 0098765401 |
| 0011223301 CededManual | 0011223301 |
I thought that "new_policy_number" might be overwritten with each iteration, so I tried creating 'i' different (4) new_policy_number variables for each element of the array. That didn't work either. Please let me know what you think. Thanks.
data test_set ;
infile datalines truncover;
input policy_number $100. ;
want=prxchange('s/MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,compress(policy_number,'_','s'));
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
Some corrections:
data want;
set test_set;
length new_policy_string $ 100;
array words[4] $ 15 _temporary_ ('MANUALENTRY', 'CEDEDMANUAL', 'MANUALCEDED', 'MANUAL');
new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
do i = 1 to dim(words);
new_policy_number = transtrn(new_policy_number, trim(words[i]), '');
end;
drop i;
run;
@andreas_lds wrote:
...
... new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
...
Did you mean:
new_policy_number = compress(upcase(policy_number), '_ ');
In other words: remove the underscores and the spaces.
@Tom wrote:
@andreas_lds wrote:
...
... new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
...
Did you mean:
new_policy_number = compress(upcase(policy_number), '_ ');In other words: remove the underscores and the spaces.
Well, yes ..😟
Hi,
you need to stop the loop after a replacement is made and stop overwriting new_policy_number over and over again until you try to replace with 'MANUALCEDED' which always fails.
data test_set ;
infile datalines truncover;
input policy_number $100. ;
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
data chk;
set test_set;
length new_policy_number $100;
retain new_policy_number ;
array nn[4] $50 _temporary_ ('MANUALENTRY' 'MANUAL' 'CEDEDMANUAL' 'MANUALCEDED');
do i=1 to dim(nn);
new_policy_number = upcase(compress(tranwrd(policy_number,'_','')));
if index(new_policy_number,nn[i]) then do;
new_policy_number = tranwrd(new_policy_number,nn[i], '') ;
leave;
end;
end;
drop i;
run;
Please also note that in obs #7 you're first replacing with manual since it comes first in your array and not 'cededmanual'
- Cheers -
data test_set ;
infile datalines truncover;
input policy_number $100. ;
want=prxchange('s/MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,compress(policy_number,'_','s'));
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
Your solution is close. As Andreas_ids points out, "MANUAL" should be checked last. Wrap it in an upcase and that will work.
What's interesting is that case doesn't matter to the prxchange function; the key words are all caps and the string is not capitalized, yet the key words are removed from the string.
Follow-up question regarding prxchange; there are a couple of other operations performed on policy_number to remove leading zeros and any characters that are not alphanumeric.
prxchange('s/^0+//o',-1,prxchange('s/[^A-Z 0-9]//i',-1,policy_number))
I tried combining the two prxchange operations into one but couldn't get it to work. Can the two prxchange statements be combined into one?
Thanks
Once for all.
data test_set ;
infile datalines truncover;
input policy_number $100. ;
want=prxchange('s/^0+|[^a-z\d]|MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,strip(policy_number));
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.