Hello!
I would like to derive "1514#0_2021Q3_2022Q1$" substring from the following string:
1514#0 Deep Dive$ 1514#0_2021Q3_2022Q1$ 1515#0 Deep Dive$ 1515#0_2021Q1_2022Q1$ XYZ$ Dictionary$
The following conditions need to be met: 1) the substring has at least one "Q" letter, 2) substrings in the list are ordered randomly.
I have used the following pattern, but it fell short:
/1514.*?Q.*?\$?/i
Please see the code for your reference:
%macro prxsubstr(pattern, string);
%let regex_id = %sysfunc(prxparse(&pattern.));
%let position = 0;
%let length = 0;
%syscall prxsubstr(regex_id, string, position, length);
%global substring;
%let substring = %substr(&string., &position., &length.);
%put &substring.;
%mend;
%let pattern = /1514.*?Q.*?\$?/i;
%let list = 1514#0 Deep Dive$ 1514#0_2021Q3_2022Q1$ 1515#0 Deep Dive$ 1515#0_2021Q1_2022Q1$ ARS_CRS KPI Results$ Dictionary$;
%prxsubstr(&pattern., &list.);
Any help will be appreciated.
Hello @Jedrek369,
Is the "specific character" mentioned in the subject line the dollar sign? If so, why do you make it optional in your pattern?
@Jedrek369 wrote:
/1514.*?Q.*?\$?/i
Perhaps because a missing dollar sign would be acceptable at the end of the list? In this case I would put
(\$|$)
to the end of the pattern (where the first "$" is the dollar sign and the second the end-of-string mark).
I think it would help to replace the periods (.) in your pattern with "negative character sets" (see Class Groupings) excluding "Q" or "$" or both:
/1514[^Q\$]*Q[^\$]*\$/i
(Also, would a leading word boundary (\b) help or would "161514..." be fine?)
Hello @Jedrek369,
Is the "specific character" mentioned in the subject line the dollar sign? If so, why do you make it optional in your pattern?
@Jedrek369 wrote:
/1514.*?Q.*?\$?/i
Perhaps because a missing dollar sign would be acceptable at the end of the list? In this case I would put
(\$|$)
to the end of the pattern (where the first "$" is the dollar sign and the second the end-of-string mark).
I think it would help to replace the periods (.) in your pattern with "negative character sets" (see Class Groupings) excluding "Q" or "$" or both:
/1514[^Q\$]*Q[^\$]*\$/i
(Also, would a leading word boundary (\b) help or would "161514..." be fine?)
Thank you! "/1514[^Q\$]*Q[^\$]*\$/i" pattern worked.
Would you mind explaining why replacement of dots to "[^Q\$]" and "[^\$]" worked?
Sorry for the late response. I had so many work assignments I did not find the time for the code.
You're welcome. Glad to hear that it worked.
There are three fixed parts in your pattern: the "1514", at least one "Q" and the "$" sign -- in this order, with the dollar sign ending the pattern. Between the "1514" and the first "Q" almost any characters are allowed, but a "Q" is logically impossible because then this would be the first "Q" and a "$" sign is not acceptable either as it would indicate the end of a string not containing a "Q". This explains why "Q" and "$" had to be excluded in that place. The greedy "*" repetition factor could be used with the [^Q\$] pattern because the search would stop at the "Q" (but not earlier) anyway.
Between the first "Q" and the closing "$", again, almost any characters are allowed (this time including "Q"), but logically not a (prematurely closing) "$" sign. This explains why I used [^\$] there. As above, the greedy "*" repetition factor is possible because of the explicit "$" closing the pattern. Alternatively, however, you could leave your original .*? in that place because the lazy "*?" repetition factor would prevent the pattern from including another "$" sign.
Thank you! Now it is clear.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.