BookmarkSubscribeRSS Feed
geoonline
Calcite | Level 5

Hi Experts,

 

   I have a variable that has different strings seperated by a pipe sign, Say as : VAR1 = STRING1|STRING2|STRING3 and so - on.

Where only one string will contain a KEYWORD --> "EXTRACT_ME".

And I would like to extract only the string that has EXTRACT_ME keyword. (an example is shown below)

 

I can defenitely achieve it with the help of a DO loop and serach for the KEYWORD in each string one by one and extract it.

But I'm looking for something more effieicnt that really need to put it in a do loop. Maybe Perl Expression?

Experts: Any advise? Appreciate your help.

 

 

INPUT_VAR OUTPUT_VAR
FIRST STRING|EXTRACT_ME[ABC]|SECOND STRING EXTRACT_ME[ABC]
STRING ONE|EXTRACT_ME[ABCDEFG]|STRING THREE|STRING FOUR EXTRACT_ME[ABCDEFG]
4 REPLIES 4
Reeza
Super User

 

You can do it with a combination of string functions, but PERL will probably work better. 

 

data have;
string="FIRST STRING|EXTRACT_ME[ABC]|SECOND STRING";output;
string="STRING ONE|EXTRACT_ME[ABCDEFG]|STRING THREE|STRING FOUR";output;
run;

data want;
	set have;
	loc= index(string, "EXTRACT_ME");
	if loc>0 then end= index(substr(string, loc), "|");
	want=substr(string, loc, end-1);
run;
Astounding
PROC Star

You'll need a slight modification, since LOC > 0 should control execution of the remainder of the statements (not just one).

 

A similar possibility:

 

if loc > 0 then want = scan(substr(string, loc), 1, '|');

Ksharp
Super User
What reason  force you to use PRX ?


data have;
string="FIRST STRING|EXTRACT_ME[ABC]|SECOND STRING";output;
string="STRING ONE|EXTRACT_ME[ABCDEFG]|STRING THREE|STRING FOUR";output;
run;
data want;
 set have;
 length want $ 200;
 pid=prxparse('/EXTRACT_ME\[\w+\]/oi');
 call prxsubstr(pid,string,p,l);
 if p then want=substr(string,p,l); 
 drop pid p l;
run;


FreelanceReinh
Jade | Level 19

Just in case you don't need the redundant "EXTRACT_ME[]," but only what's in the square brackets behind this keyword, you can modify Ksharp's Perl regular expression for example as follows:

pid=prxparse('/(?<=EXTRACT_ME\[)[^\]]+/oi');

This expression matches a sequence of one or more characters not equal to "]" after the text "EXTRACT_ME[" (case-insensitive).

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1268 views
  • 2 likes
  • 5 in conversation