BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
harrylui
Obsidian | Level 7

Good Day,

 

can i scan a word by other string?

 

string1                                                             string2  

ALIPAYHKCUBER COMPANY HKG              ALIPAYHKCUBERCOMPANY


the result i want is

 ALIPAYHKCUBER COMPANY

 

and also , can i develop a program that can extract string before blank

 

for this example, i want to drop all the character after two blank

 

ALIPAYHKCUBER COMPANY HKG 

 

thanks in advance

Harry

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @harrylui 

Here is a way to achieve this, using pearl regular expressions:

 

data have;
	infile datalines dlm="," truncover;
	input string1:$50.;
	datalines;
	ALIPAYHKCUBER COMPANY HKG
;
run;

data want;
	set have;
	length string2 string3 $ 100;
	/* string2: look for ALIPAYHKCUBER COMPANY and retrieve this word */
	if prxmatch('/\bALIPAYHKCUBER COMPANY\b/', string1) then 
		string2 = prxchange('s/^.*(\bALIPAYHKCUBER COMPANY\b).*$/$1/',-1, string1);
	/* string3: drop all the character after two blank */
	if prxmatch('/[^\s]*\s[^\s]*/', string1) then 
		string3 = prxchange('s/^(\s*[^\s]*\s[^\s]*)\s.*$/$1/',-1, string1);
run;

 

 

Explanations:

  • string2
    • if SAS finds the pattern \bALIPAYHKCUBER COMPANY\b (NB: \b = match a word boundary) using the prxmatch function, then it retrieves this word using the prxchange function. In practice, it looks for the whole pattern ^.*(\bALIPAYHKCUBER COMPANY\b).*$ and it replace it by the group in parenthesis $1
      • \s = looks for a blank
      • . = looks for any character
      • * = looks for the specified character before the * 0, 1 or more times
      • ^ = matches the beginning of the string
      • $ = matches the end of the string
  • string3
    • if SAS finds the pattern [^\s]*\s[^\s]* using the prxmatch function, then it retrieves this pattern using the prxchange function. In practice, it looks for the whole pattern ^(\s*[^\s]*\s[^\s]*)\s.*$ and it replace it by the group in parenthesis $1
      • \s = looks for a blank
      • . = looks for any character
      • [^\s] = looks for any character except a blank
      • * = looks for the specified character before the * 0, 1 or more times
      • ^ = matches the beginning of the string
      • $ = matches the end of the string

NB, if you use the following code:

data want;
	set have;
	if find(string1,"ALIPAYHKCUBER COMPANY") then string2 = "ALIPAYHKCUBER COMPANY";
	string3 = catx(" ", scan(string1,1," "), scan(string1,2," "));
run;

- the calculation of string3 is strictly equivalent to the above method using pearl regular expression.

- the calculation of string2 differs: in fact, it looks for the string "ALIPAYHKCUBER COMPANY", but really as a string and not as two words. For example, xxxALIPAYHKCUBER COMPANY would be retrieved with this code using the FIND function. It would not be retrieved by the PRXMATCH function, as it looks also for a word boundary.

 

 

All the best,

View solution in original post

4 REPLIES 4
ed_sas_member
Meteorite | Level 14

Hi @harrylui 

Here is a way to achieve this, using pearl regular expressions:

 

data have;
	infile datalines dlm="," truncover;
	input string1:$50.;
	datalines;
	ALIPAYHKCUBER COMPANY HKG
;
run;

data want;
	set have;
	length string2 string3 $ 100;
	/* string2: look for ALIPAYHKCUBER COMPANY and retrieve this word */
	if prxmatch('/\bALIPAYHKCUBER COMPANY\b/', string1) then 
		string2 = prxchange('s/^.*(\bALIPAYHKCUBER COMPANY\b).*$/$1/',-1, string1);
	/* string3: drop all the character after two blank */
	if prxmatch('/[^\s]*\s[^\s]*/', string1) then 
		string3 = prxchange('s/^(\s*[^\s]*\s[^\s]*)\s.*$/$1/',-1, string1);
run;

 

 

Explanations:

  • string2
    • if SAS finds the pattern \bALIPAYHKCUBER COMPANY\b (NB: \b = match a word boundary) using the prxmatch function, then it retrieves this word using the prxchange function. In practice, it looks for the whole pattern ^.*(\bALIPAYHKCUBER COMPANY\b).*$ and it replace it by the group in parenthesis $1
      • \s = looks for a blank
      • . = looks for any character
      • * = looks for the specified character before the * 0, 1 or more times
      • ^ = matches the beginning of the string
      • $ = matches the end of the string
  • string3
    • if SAS finds the pattern [^\s]*\s[^\s]* using the prxmatch function, then it retrieves this pattern using the prxchange function. In practice, it looks for the whole pattern ^(\s*[^\s]*\s[^\s]*)\s.*$ and it replace it by the group in parenthesis $1
      • \s = looks for a blank
      • . = looks for any character
      • [^\s] = looks for any character except a blank
      • * = looks for the specified character before the * 0, 1 or more times
      • ^ = matches the beginning of the string
      • $ = matches the end of the string

NB, if you use the following code:

data want;
	set have;
	if find(string1,"ALIPAYHKCUBER COMPANY") then string2 = "ALIPAYHKCUBER COMPANY";
	string3 = catx(" ", scan(string1,1," "), scan(string1,2," "));
run;

- the calculation of string3 is strictly equivalent to the above method using pearl regular expression.

- the calculation of string2 differs: in fact, it looks for the string "ALIPAYHKCUBER COMPANY", but really as a string and not as two words. For example, xxxALIPAYHKCUBER COMPANY would be retrieved with this code using the FIND function. It would not be retrieved by the PRXMATCH function, as it looks also for a word boundary.

 

 

All the best,

s_lassen
Meteorite | Level 14

Not quite sure if this is what you want, but:

Given string1="ALIPAYHKCUBER COMPANY HKG" and string2="ALIPAYHKCUBERCOMPANY", you can try this:

Data want;
  set have;
  string1=catx(' ',scan(string1,1,' '),scan(string1,2,' '));
  check=string2=compress(string1);
run;

STRING1 should then contain the first 2 words of the original string, and the variable CHECK is 1 if STRING2 matches that, otherwise 0.

harrylui
Obsidian | Level 7

THANK YOU!

ed_sas_member
Meteorite | Level 14
You're welcome @harrylui!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 862 views
  • 1 like
  • 3 in conversation