BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
harrylui
Obsidian | Level 7

Good Day,

 

can i scan a word by other string?

 

string1                                                             string2  

ALIPAYHKCUBER COMPANY HKG              ALIPAYHKCUBERCOMPANY


the result i want is

 ALIPAYHKCUBER COMPANY

 

and also , can i develop a program that can extract string before blank

 

for this example, i want to drop all the character after two blank

 

ALIPAYHKCUBER COMPANY HKG 

 

thanks in advance

Harry

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @harrylui 

Here is a way to achieve this, using pearl regular expressions:

 

data have;
	infile datalines dlm="," truncover;
	input string1:$50.;
	datalines;
	ALIPAYHKCUBER COMPANY HKG
;
run;

data want;
	set have;
	length string2 string3 $ 100;
	/* string2: look for ALIPAYHKCUBER COMPANY and retrieve this word */
	if prxmatch('/\bALIPAYHKCUBER COMPANY\b/', string1) then 
		string2 = prxchange('s/^.*(\bALIPAYHKCUBER COMPANY\b).*$/$1/',-1, string1);
	/* string3: drop all the character after two blank */
	if prxmatch('/[^\s]*\s[^\s]*/', string1) then 
		string3 = prxchange('s/^(\s*[^\s]*\s[^\s]*)\s.*$/$1/',-1, string1);
run;

 

 

Explanations:

  • string2
    • if SAS finds the pattern \bALIPAYHKCUBER COMPANY\b (NB: \b = match a word boundary) using the prxmatch function, then it retrieves this word using the prxchange function. In practice, it looks for the whole pattern ^.*(\bALIPAYHKCUBER COMPANY\b).*$ and it replace it by the group in parenthesis $1
      • \s = looks for a blank
      • . = looks for any character
      • * = looks for the specified character before the * 0, 1 or more times
      • ^ = matches the beginning of the string
      • $ = matches the end of the string
  • string3
    • if SAS finds the pattern [^\s]*\s[^\s]* using the prxmatch function, then it retrieves this pattern using the prxchange function. In practice, it looks for the whole pattern ^(\s*[^\s]*\s[^\s]*)\s.*$ and it replace it by the group in parenthesis $1
      • \s = looks for a blank
      • . = looks for any character
      • [^\s] = looks for any character except a blank
      • * = looks for the specified character before the * 0, 1 or more times
      • ^ = matches the beginning of the string
      • $ = matches the end of the string

NB, if you use the following code:

data want;
	set have;
	if find(string1,"ALIPAYHKCUBER COMPANY") then string2 = "ALIPAYHKCUBER COMPANY";
	string3 = catx(" ", scan(string1,1," "), scan(string1,2," "));
run;

- the calculation of string3 is strictly equivalent to the above method using pearl regular expression.

- the calculation of string2 differs: in fact, it looks for the string "ALIPAYHKCUBER COMPANY", but really as a string and not as two words. For example, xxxALIPAYHKCUBER COMPANY would be retrieved with this code using the FIND function. It would not be retrieved by the PRXMATCH function, as it looks also for a word boundary.

 

 

All the best,

View solution in original post

4 REPLIES 4
ed_sas_member
Meteorite | Level 14

Hi @harrylui 

Here is a way to achieve this, using pearl regular expressions:

 

data have;
	infile datalines dlm="," truncover;
	input string1:$50.;
	datalines;
	ALIPAYHKCUBER COMPANY HKG
;
run;

data want;
	set have;
	length string2 string3 $ 100;
	/* string2: look for ALIPAYHKCUBER COMPANY and retrieve this word */
	if prxmatch('/\bALIPAYHKCUBER COMPANY\b/', string1) then 
		string2 = prxchange('s/^.*(\bALIPAYHKCUBER COMPANY\b).*$/$1/',-1, string1);
	/* string3: drop all the character after two blank */
	if prxmatch('/[^\s]*\s[^\s]*/', string1) then 
		string3 = prxchange('s/^(\s*[^\s]*\s[^\s]*)\s.*$/$1/',-1, string1);
run;

 

 

Explanations:

  • string2
    • if SAS finds the pattern \bALIPAYHKCUBER COMPANY\b (NB: \b = match a word boundary) using the prxmatch function, then it retrieves this word using the prxchange function. In practice, it looks for the whole pattern ^.*(\bALIPAYHKCUBER COMPANY\b).*$ and it replace it by the group in parenthesis $1
      • \s = looks for a blank
      • . = looks for any character
      • * = looks for the specified character before the * 0, 1 or more times
      • ^ = matches the beginning of the string
      • $ = matches the end of the string
  • string3
    • if SAS finds the pattern [^\s]*\s[^\s]* using the prxmatch function, then it retrieves this pattern using the prxchange function. In practice, it looks for the whole pattern ^(\s*[^\s]*\s[^\s]*)\s.*$ and it replace it by the group in parenthesis $1
      • \s = looks for a blank
      • . = looks for any character
      • [^\s] = looks for any character except a blank
      • * = looks for the specified character before the * 0, 1 or more times
      • ^ = matches the beginning of the string
      • $ = matches the end of the string

NB, if you use the following code:

data want;
	set have;
	if find(string1,"ALIPAYHKCUBER COMPANY") then string2 = "ALIPAYHKCUBER COMPANY";
	string3 = catx(" ", scan(string1,1," "), scan(string1,2," "));
run;

- the calculation of string3 is strictly equivalent to the above method using pearl regular expression.

- the calculation of string2 differs: in fact, it looks for the string "ALIPAYHKCUBER COMPANY", but really as a string and not as two words. For example, xxxALIPAYHKCUBER COMPANY would be retrieved with this code using the FIND function. It would not be retrieved by the PRXMATCH function, as it looks also for a word boundary.

 

 

All the best,

s_lassen
Meteorite | Level 14

Not quite sure if this is what you want, but:

Given string1="ALIPAYHKCUBER COMPANY HKG" and string2="ALIPAYHKCUBERCOMPANY", you can try this:

Data want;
  set have;
  string1=catx(' ',scan(string1,1,' '),scan(string1,2,' '));
  check=string2=compress(string1);
run;

STRING1 should then contain the first 2 words of the original string, and the variable CHECK is 1 if STRING2 matches that, otherwise 0.

harrylui
Obsidian | Level 7

THANK YOU!

ed_sas_member
Meteorite | Level 14
You're welcome @harrylui!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 621 views
  • 1 like
  • 3 in conversation