SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Jedrek369
Fluorite | Level 6

Hello!

I would like to derive "1514#0_2021Q3_2022Q1$" substring from the following string:

 

1514#0 Deep Dive$   1514#0_2021Q3_2022Q1$   1515#0 Deep Dive$   1515#0_2021Q1_2022Q1$   XYZ$  Dictionary$

The following conditions need to be met: 1) the substring has at least one "Q" letter, 2) substrings in the list are ordered randomly.

 

I have used the following pattern, but it fell short:

 

/1514.*?Q.*?\$?/i

 

Please see the code for your reference:

%macro prxsubstr(pattern, string);
	%let regex_id = %sysfunc(prxparse(&pattern.));
	%let position = 0;
	%let length = 0;
	%syscall prxsubstr(regex_id, string, position, length);
	%global substring;
	%let substring = %substr(&string., &position., &length.);
	%put &substring.;
%mend;

%let pattern = /1514.*?Q.*?\$?/i;
%let list = 1514#0 Deep Dive$   1514#0_2021Q3_2022Q1$   1515#0 Deep Dive$   1515#0_2021Q1_2022Q1$   ARS_CRS KPI Results$  Dictionary$;

%prxsubstr(&pattern., &list.);

Any help will be appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Jedrek369,

 

Is the "specific character" mentioned in the subject line the dollar sign? If so, why do you make it optional in your pattern?


@Jedrek369 wrote:
/1514.*?Q.*?\$?/i

Perhaps because a missing dollar sign would be acceptable at the end of the list? In this case I would put

(\$|$)

to the end of the pattern (where the first "$" is the dollar sign and the second the end-of-string mark).

 

I think it would help to replace the periods (.) in your pattern with "negative character sets" (see Class Groupings) excluding "Q" or "$" or both:

/1514[^Q\$]*Q[^\$]*\$/i

(Also, would a leading word boundary (\b) help or would "161514..." be fine?)

View solution in original post

4 REPLIES 4
FreelanceReinh
Jade | Level 19

Hello @Jedrek369,

 

Is the "specific character" mentioned in the subject line the dollar sign? If so, why do you make it optional in your pattern?


@Jedrek369 wrote:
/1514.*?Q.*?\$?/i

Perhaps because a missing dollar sign would be acceptable at the end of the list? In this case I would put

(\$|$)

to the end of the pattern (where the first "$" is the dollar sign and the second the end-of-string mark).

 

I think it would help to replace the periods (.) in your pattern with "negative character sets" (see Class Groupings) excluding "Q" or "$" or both:

/1514[^Q\$]*Q[^\$]*\$/i

(Also, would a leading word boundary (\b) help or would "161514..." be fine?)

Jedrek369
Fluorite | Level 6

Thank you! "/1514[^Q\$]*Q[^\$]*\$/i" pattern worked.

 

Would you mind explaining why replacement of dots to "[^Q\$]" and "[^\$]" worked?

 

Sorry for the late response. I had so many work assignments I did not find the time for the code.

FreelanceReinh
Jade | Level 19

You're welcome. Glad to hear that it worked.

 

There are three fixed parts in your pattern: the "1514", at least one "Q" and the "$" sign -- in this order, with the dollar sign ending the pattern. Between the "1514" and the first "Q" almost any characters are allowed, but a "Q" is logically impossible because then this would be the first "Q" and a "$" sign is not acceptable either as it would indicate the end of a string not containing a "Q". This explains why "Q" and "$" had to be excluded in that place. The greedy "*" repetition factor could be used with the [^Q\$] pattern because the search would stop at the "Q" (but not earlier) anyway.

 

Between the first "Q" and the closing "$", again, almost any characters are allowed (this time including "Q"), but logically not a (prematurely closing) "$" sign. This explains why I used [^\$] there. As above, the greedy "*" repetition factor is possible because of the explicit "$" closing the pattern. Alternatively, however, you could leave your original .*? in that place because the lazy "*?" repetition factor would prevent the pattern from including another "$" sign.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 954 views
  • 2 likes
  • 2 in conversation