Hi Experts,
I have a variable that has different strings seperated by a pipe sign, Say as : VAR1 = STRING1|STRING2|STRING3 and so - on.
Where only one string will contain a KEYWORD --> "EXTRACT_ME".
And I would like to extract only the string that has EXTRACT_ME keyword. (an example is shown below)
I can defenitely achieve it with the help of a DO loop and serach for the KEYWORD in each string one by one and extract it.
But I'm looking for something more effieicnt that really need to put it in a do loop. Maybe Perl Expression?
Experts: Any advise? Appreciate your help.
| INPUT_VAR | OUTPUT_VAR |
| FIRST STRING|EXTRACT_ME[ABC]|SECOND STRING | EXTRACT_ME[ABC] |
| STRING ONE|EXTRACT_ME[ABCDEFG]|STRING THREE|STRING FOUR | EXTRACT_ME[ABCDEFG] |
You can do it with a combination of string functions, but PERL will probably work better.
data have;
string="FIRST STRING|EXTRACT_ME[ABC]|SECOND STRING";output;
string="STRING ONE|EXTRACT_ME[ABCDEFG]|STRING THREE|STRING FOUR";output;
run;
data want;
set have;
loc= index(string, "EXTRACT_ME");
if loc>0 then end= index(substr(string, loc), "|");
want=substr(string, loc, end-1);
run;
You'll need a slight modification, since LOC > 0 should control execution of the remainder of the statements (not just one).
A similar possibility:
if loc > 0 then want = scan(substr(string, loc), 1, '|');
What reason force you to use PRX ?
data have;
string="FIRST STRING|EXTRACT_ME[ABC]|SECOND STRING";output;
string="STRING ONE|EXTRACT_ME[ABCDEFG]|STRING THREE|STRING FOUR";output;
run;
data want;
set have;
length want $ 200;
pid=prxparse('/EXTRACT_ME\[\w+\]/oi');
call prxsubstr(pid,string,p,l);
if p then want=substr(string,p,l);
drop pid p l;
run;
Just in case you don't need the redundant "EXTRACT_ME[]," but only what's in the square brackets behind this keyword, you can modify Ksharp's Perl regular expression for example as follows:
pid=prxparse('/(?<=EXTRACT_ME\[)[^\]]+/oi');
This expression matches a sequence of one or more characters not equal to "]" after the text "EXTRACT_ME[" (case-insensitive).
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.