Good Day,
can i scan a word by other string?
string1 string2
ALIPAYHKCUBER COMPANY HKG ALIPAYHKCUBERCOMPANY
the result i want is
ALIPAYHKCUBER COMPANY
and also , can i develop a program that can extract string before blank
for this example, i want to drop all the character after two blank
ALIPAYHKCUBER COMPANY HKG
thanks in advance
Harry
Hi @harrylui
Here is a way to achieve this, using pearl regular expressions:
data have;
infile datalines dlm="," truncover;
input string1:$50.;
datalines;
ALIPAYHKCUBER COMPANY HKG
;
run;
data want;
set have;
length string2 string3 $ 100;
/* string2: look for ALIPAYHKCUBER COMPANY and retrieve this word */
if prxmatch('/\bALIPAYHKCUBER COMPANY\b/', string1) then
string2 = prxchange('s/^.*(\bALIPAYHKCUBER COMPANY\b).*$/$1/',-1, string1);
/* string3: drop all the character after two blank */
if prxmatch('/[^\s]*\s[^\s]*/', string1) then
string3 = prxchange('s/^(\s*[^\s]*\s[^\s]*)\s.*$/$1/',-1, string1);
run;
Explanations:
(NB: \b = match a word boundary) using the prxmatch function, then it retrieves this word using the prxchange function. In practice, it looks for the whole pattern ^.*(\bALIPAYHKCUBER COMPANY\b).*$ and it replace it by the group in parenthesis $1
and it replace it by the group in parenthesis $1
[^\s]
= looks for any character except a blankNB, if you use the following code:
data want;
set have;
if find(string1,"ALIPAYHKCUBER COMPANY") then string2 = "ALIPAYHKCUBER COMPANY";
string3 = catx(" ", scan(string1,1," "), scan(string1,2," "));
run;
- the calculation of string3 is strictly equivalent to the above method using pearl regular expression.
- the calculation of string2 differs: in fact, it looks for the string "ALIPAYHKCUBER COMPANY", but really as a string and not as two words. For example, xxxALIPAYHKCUBER COMPANY would be retrieved with this code using the FIND function. It would not be retrieved by the PRXMATCH function, as it looks also for a word boundary.
All the best,
Hi @harrylui
Here is a way to achieve this, using pearl regular expressions:
data have;
infile datalines dlm="," truncover;
input string1:$50.;
datalines;
ALIPAYHKCUBER COMPANY HKG
;
run;
data want;
set have;
length string2 string3 $ 100;
/* string2: look for ALIPAYHKCUBER COMPANY and retrieve this word */
if prxmatch('/\bALIPAYHKCUBER COMPANY\b/', string1) then
string2 = prxchange('s/^.*(\bALIPAYHKCUBER COMPANY\b).*$/$1/',-1, string1);
/* string3: drop all the character after two blank */
if prxmatch('/[^\s]*\s[^\s]*/', string1) then
string3 = prxchange('s/^(\s*[^\s]*\s[^\s]*)\s.*$/$1/',-1, string1);
run;
Explanations:
(NB: \b = match a word boundary) using the prxmatch function, then it retrieves this word using the prxchange function. In practice, it looks for the whole pattern ^.*(\bALIPAYHKCUBER COMPANY\b).*$ and it replace it by the group in parenthesis $1
and it replace it by the group in parenthesis $1
[^\s]
= looks for any character except a blankNB, if you use the following code:
data want;
set have;
if find(string1,"ALIPAYHKCUBER COMPANY") then string2 = "ALIPAYHKCUBER COMPANY";
string3 = catx(" ", scan(string1,1," "), scan(string1,2," "));
run;
- the calculation of string3 is strictly equivalent to the above method using pearl regular expression.
- the calculation of string2 differs: in fact, it looks for the string "ALIPAYHKCUBER COMPANY", but really as a string and not as two words. For example, xxxALIPAYHKCUBER COMPANY would be retrieved with this code using the FIND function. It would not be retrieved by the PRXMATCH function, as it looks also for a word boundary.
All the best,
Not quite sure if this is what you want, but:
Given string1="ALIPAYHKCUBER COMPANY HKG" and string2="ALIPAYHKCUBERCOMPANY", you can try this:
Data want;
set have;
string1=catx(' ',scan(string1,1,' '),scan(string1,2,' '));
check=string2=compress(string1);
run;
STRING1 should then contain the first 2 words of the original string, and the variable CHECK is 1 if STRING2 matches that, otherwise 0.
THANK YOU!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.