- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
I would like to derive "1514#0_2021Q3_2022Q1$" substring from the following string:
1514#0 Deep Dive$ 1514#0_2021Q3_2022Q1$ 1515#0 Deep Dive$ 1515#0_2021Q1_2022Q1$ XYZ$ Dictionary$
The following conditions need to be met: 1) the substring has at least one "Q" letter, 2) substrings in the list are ordered randomly.
I have used the following pattern, but it fell short:
/1514.*?Q.*?\$?/i
Please see the code for your reference:
%macro prxsubstr(pattern, string);
%let regex_id = %sysfunc(prxparse(&pattern.));
%let position = 0;
%let length = 0;
%syscall prxsubstr(regex_id, string, position, length);
%global substring;
%let substring = %substr(&string., &position., &length.);
%put &substring.;
%mend;
%let pattern = /1514.*?Q.*?\$?/i;
%let list = 1514#0 Deep Dive$ 1514#0_2021Q3_2022Q1$ 1515#0 Deep Dive$ 1515#0_2021Q1_2022Q1$ ARS_CRS KPI Results$ Dictionary$;
%prxsubstr(&pattern., &list.);
Any help will be appreciated.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Jedrek369,
Is the "specific character" mentioned in the subject line the dollar sign? If so, why do you make it optional in your pattern?
@Jedrek369 wrote:
/1514.*?Q.*?\$?/i
Perhaps because a missing dollar sign would be acceptable at the end of the list? In this case I would put
(\$|$)
to the end of the pattern (where the first "$" is the dollar sign and the second the end-of-string mark).
I think it would help to replace the periods (.) in your pattern with "negative character sets" (see Class Groupings) excluding "Q" or "$" or both:
/1514[^Q\$]*Q[^\$]*\$/i
(Also, would a leading word boundary (\b) help or would "161514..." be fine?)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Jedrek369,
Is the "specific character" mentioned in the subject line the dollar sign? If so, why do you make it optional in your pattern?
@Jedrek369 wrote:
/1514.*?Q.*?\$?/i
Perhaps because a missing dollar sign would be acceptable at the end of the list? In this case I would put
(\$|$)
to the end of the pattern (where the first "$" is the dollar sign and the second the end-of-string mark).
I think it would help to replace the periods (.) in your pattern with "negative character sets" (see Class Groupings) excluding "Q" or "$" or both:
/1514[^Q\$]*Q[^\$]*\$/i
(Also, would a leading word boundary (\b) help or would "161514..." be fine?)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! "/1514[^Q\$]*Q[^\$]*\$/i" pattern worked.
Would you mind explaining why replacement of dots to "[^Q\$]" and "[^\$]" worked?
Sorry for the late response. I had so many work assignments I did not find the time for the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You're welcome. Glad to hear that it worked.
There are three fixed parts in your pattern: the "1514", at least one "Q" and the "$" sign -- in this order, with the dollar sign ending the pattern. Between the "1514" and the first "Q" almost any characters are allowed, but a "Q" is logically impossible because then this would be the first "Q" and a "$" sign is not acceptable either as it would indicate the end of a string not containing a "Q". This explains why "Q" and "$" had to be excluded in that place. The greedy "*" repetition factor could be used with the [^Q\$] pattern because the search would stop at the "Q" (but not earlier) anyway.
Between the first "Q" and the closing "$", again, almost any characters are allowed (this time including "Q"), but logically not a (prematurely closing) "$" sign. This explains why I used [^\$] there. As above, the greedy "*" repetition factor is possible because of the explicit "$" closing the pattern. Alternatively, however, you could leave your original .*? in that place because the lazy "*?" repetition factor would prevent the pattern from including another "$" sign.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! Now it is clear.