Hi,
I'm trying to remove any key words ('MANUAL', 'MANUALENTRY', 'CEDEDMANUAL', 'MANUALCEDED') from a string (policy_number).
Here is the data:
data test_set ;
infile datalines truncover;
input policy_number $100. ;
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
Here's the code:
data chk;
set test_set;
retain new_policy_number ;
array nn[*] $ _all_ ('MANUALENTRY', 'MANUAL', 'CEDEDMANUAL', 'MANUALCEDED');
do i=1 to dim(nn);
new_policy_number = tranwrd(upcase(compress(tranwrd(policy_number,'_',''))), nn[i], "") ;
end;
drop i;
run;
Here are the results:
policy_number | new_policy_number |
Q2 2017 Manual Entry | Q22017MANUALENTRY |
00123400 | 00123400 |
Manual | MANUAL |
Manual_fix15 | MANUALFIX15 |
Manual_fix22Q1 | MANUALFIX22Q1 |
Manual All Data | MANUALALLDATA |
0098765401CededManual | 0098765401CEDEDMANUAL |
0098765401Manual | 0098765401MANUAL |
0011223301 CededManual | 0011223301CEDEDMANUAL |
None of the substrings were removed. Here's what they should be:
policy_number | Should be |
Q2 2017 Manual Entry | Q22017 |
00123400 | 00123400 |
Manual | |
Manual_fix15 | FIX15 |
Manual_fix22Q1 | FIX22Q1 |
Manual All Data | ALLDATA |
0098765401CededManual | 0098765401 |
0098765401Manual | 0098765401 |
0011223301 CededManual | 0011223301 |
I thought that "new_policy_number" might be overwritten with each iteration, so I tried creating 'i' different (4) new_policy_number variables for each element of the array. That didn't work either. Please let me know what you think. Thanks.
data test_set ;
infile datalines truncover;
input policy_number $100. ;
want=prxchange('s/MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,compress(policy_number,'_','s'));
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
Some corrections:
data want;
set test_set;
length new_policy_string $ 100;
array words[4] $ 15 _temporary_ ('MANUALENTRY', 'CEDEDMANUAL', 'MANUALCEDED', 'MANUAL');
new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
do i = 1 to dim(words);
new_policy_number = transtrn(new_policy_number, trim(words[i]), '');
end;
drop i;
run;
@andreas_lds wrote:
...
... new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
...
Did you mean:
new_policy_number = compress(upcase(policy_number), '_ ');
In other words: remove the underscores and the spaces.
@Tom wrote:
@andreas_lds wrote:
...
... new_policy_number = compress(transtrn(upcase(policy_number), '_', ''));
...
Did you mean:
new_policy_number = compress(upcase(policy_number), '_ ');
In other words: remove the underscores and the spaces.
Well, yes ..😟
Hi,
you need to stop the loop after a replacement is made and stop overwriting new_policy_number over and over again until you try to replace with 'MANUALCEDED' which always fails.
data test_set ;
infile datalines truncover;
input policy_number $100. ;
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
data chk;
set test_set;
length new_policy_number $100;
retain new_policy_number ;
array nn[4] $50 _temporary_ ('MANUALENTRY' 'MANUAL' 'CEDEDMANUAL' 'MANUALCEDED');
do i=1 to dim(nn);
new_policy_number = upcase(compress(tranwrd(policy_number,'_','')));
if index(new_policy_number,nn[i]) then do;
new_policy_number = tranwrd(new_policy_number,nn[i], '') ;
leave;
end;
end;
drop i;
run;
Please also note that in obs #7 you're first replacing with manual since it comes first in your array and not 'cededmanual'
- Cheers -
data test_set ;
infile datalines truncover;
input policy_number $100. ;
want=prxchange('s/MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,compress(policy_number,'_','s'));
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
Your solution is close. As Andreas_ids points out, "MANUAL" should be checked last. Wrap it in an upcase and that will work.
What's interesting is that case doesn't matter to the prxchange function; the key words are all caps and the string is not capitalized, yet the key words are removed from the string.
Follow-up question regarding prxchange; there are a couple of other operations performed on policy_number to remove leading zeros and any characters that are not alphanumeric.
prxchange('s/^0+//o',-1,prxchange('s/[^A-Z 0-9]//i',-1,policy_number))
I tried combining the two prxchange operations into one but couldn't get it to work. Can the two prxchange statements be combined into one?
Thanks
Once for all.
data test_set ;
infile datalines truncover;
input policy_number $100. ;
want=prxchange('s/^0+|[^a-z\d]|MANUALENTRY|MANUAL|CEDEDMANUAL|MANUALCEDED//i',-1,strip(policy_number));
datalines;
Q2 2017 Manual Entry
000123400
Manual
Manual_fix15
Manual_fix22Q1
Manual All Data
0098765401CededManual
0098765401Manual
0011223301 CededManual
;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.