Dear all,
How can I remove the 'THE' from the 'HRM_L2_Step3'variable which end in 'THE' and start with 'THE'.
for example
for table a
THKDNAWTHE
COIOMMOO THE
THEDONNWWW
THE DOCMESSC
THE DOCMESS THE
THEIMLEMTHE
I expect to get table b
THKDNAWTHE
COIOMMOO
THEDONNWWW
DOCMESSC
DOCMESS
THEIMLEMTHE
Could you please give me some suggestions? thanks in advance
data a;
infile datalines dlm=',' truncover;
informat HRM_L2_Step3 $50.;
input HRM_L2_Step3 ;
datalines;
THKDNAWTHE
COIOMMOO THE
THEDONNWWW
THE DOCMESSC
THE DOCMESS THE
THEIMLEMTHE
run;
Here is one way
data a;
infile datalines dlm=',' truncover;
informat HRM_L2_Step3 $50.;
input HRM_L2_Step3 ;
datalines;
THKDNAWTHE
COIOMMOO THE
THEDONNWWW
THE DOCMESSC
THE DOCMESS THE
THEIMLEMTHE
run;
data b;
set a;
HRM_L2_Step3=prxchange('s/\sTHE|THE\s//',-1,HRM_L2_Step3);
run;
Here is one way
data a;
infile datalines dlm=',' truncover;
informat HRM_L2_Step3 $50.;
input HRM_L2_Step3 ;
datalines;
THKDNAWTHE
COIOMMOO THE
THEDONNWWW
THE DOCMESSC
THE DOCMESS THE
THEIMLEMTHE
run;
data b;
set a;
HRM_L2_Step3=prxchange('s/\sTHE|THE\s//',-1,HRM_L2_Step3);
run;
Dear draycut,
thanks for your advice.
Could you please explain the meaning of '
prxchange('s/\sTHE|THE\s//',-1,
'
I really expect to know how it works.
thanks for your attention to this matter.
Sure, I will give it a try. However, if you are not familiar with regular expressions, some of it may not make sense 🙂
The PRXCHANGE Functions first argument is's/\sTHE|THE\s//'.
The first
s
means that I want to substitute the pattern between the first and second slash with the text between the second and third slash (in this case nothing). The \sTHE means that I search for a pattern matching a whitespace followed by the string "THE". | is an 'or symbol', meaning that I search for a pattern matching either the string "THE" with a blank before OR after it.
The -1 in the next argument means that I keep searching for and replacing the matched patterns until the end of the input string.
Hope this makes sense
I'm glad you found your answer 🙂
Dear draycut,
I face a new problem during the process.
I do not want to remove the 'THE' form below observations during the process.
AGI THERAPEUTICS
RAPEUTICSTHE AGI
Could you please give me some suggestions?
Tranwrd is nice for replacing text. The transtrN can replace text with "" (nothing).
data want;
set a;
HRM_L2_Step3 = transtrN(HRM_L2_Step3,"THE",trimn(''));
run;
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Just in case the previous suggestions aren't 100% working, here is a possibility:
data want;
set have;
if HRM_L2_Step3 =: 'THE ' then HRM_L2_Step3 = substr(HRM_L2_Step3, 4);
if scan(HRM_L2_Step3, -1) = 'THE' then
substr(HRM_L2_Step3, length(HRM_L2_Step3) - 2, 3) = ' ';
run;
Dear Astounding,
thanks for your advice.
However, for 'THE DOCMESSC', I get ' DOCMESSC'. How can I remove the blank which before 'DOCMESSC'?
Could you please give me some suggestions?
Thanks, Astounding, it works.
However, I have a new problem during the process.
For 'THE * ALUMINIUM POWDER COMPANY', how can I remove 'THE * ' (i.e., only keep 'ALUMINIUM POWDER COMPANY')?
besides, what will happen if I set
data want ;
set a;
if HRM_L2_Step3 =: 'THE ' then HRM_L2_Step3 = substr(HRM_L2_Step3, 5);
if scan(HRM_L2_Step3, -1) = 'THE' then
substr(HRM_L2_Step3, length(HRM_L2_Step3) - 2, 3) = '';
run;
rather than
data want ;
set a;
if HRM_L2_Step3 =: 'THE ' then HRM_L2_Step3 = substr(HRM_L2_Step3, 5);
if scan(HRM_L2_Step3, -1) = 'THE' then
substr(HRM_L2_Step3, length(HRM_L2_Step3) - 2, 3) = ' ';
run;
I did not find any difference when I delete the blanks between two '
thanks for your help.
thanks in advance.
I was sure that using 3 blanks would work. I wasn't sure what would happen if I switched to 1 (or none), so used what I knew would work.
There are probably many characters that you might want to remove, that your original specifications didn't cover. What would happen, for example, if you had THE plus 2 blanks at the beginning of a line? What if any of these characters were the 5th character in the line, following "THE "?
* (as you mentioned)
-
.
$
%
#
Perhaps you should run this program after cleaning up the "THE" occurrences:
proc freq data=have;
tables HRM_L2_Step3;
format HRM_L2_Step3 $1.;
run;
That will show you how many times each possible first character actually appears in the entries that remain.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.