I've got a datastep in which I extract a number from a string. But... The strings are changing from time to time, but only the first part before the word "Incident" (sometime there appear some numbers, but I don't need those). So what I can do, is make 2 datasteps with the statements:
PATTERN = PRXPARSE("/^Incident/"); --> start from this word
PATTERN = PRXPARSE("/\d\d\d\d\d?/"); --> collect the desired number
But it must be able to combine these statements, I think? Saves a lot of time and space! :-) Thanks in advance!
Are you looking for something like that ?
DATA work.test (DROP = regExp) ;
INFILE CARDS DLM = ";" ;
INPUT text :$40. ;
RETAIN regExp ;
IF _N_=1 THEN regExp = PRXPARSE("/(I|i)ncident(\d+).?/") ;
IF PRXMATCH(regExp, text) THEN number = PRXPOSN(regExp, 2, text)+0 ;
No incident at all
Incident2 @ 12:00
if _n_ = 1 then do;
PATTERN = PRXPARSE("/\d\d\d\d\d?/");
IF MISSING(PATTERN) THEN DO;
PUT "ERROR IN COMPILING REGULAR EXPRESSION";
IF START GT 0 THEN DO;
NUMBER = SUBSTR(test1,START,LENGTH);
NUMBER = COMPRESS(NUMBER," ");
So what I need is only the number after the word "Incident". Unfortunately, I can't check your code right now. I assume the (I|i) part is to make a distintion between capital written "Incidents"? And you use the PRXMATCH function, that's the only way I think if it isn't possible to make 1 PRXPARSE statement in which you state 2 different cases (start from the word "Incident" with should be possible with the "^" option, and from there the "d's". I'll let you know, thanks!
This is a totally old school example, using INDEX and COMPRESS but I threw in PRXMATCH(see below) to compare to INDEX and both PRXMATCH and INDEX return the same results (if you compare FOUNDIT and FOUNDIT2). The COMPRESS/SUBSTR is not as elegant as the other solution but it does the job.
length grp $1 string $100;
infile datalines dsd dlm=',';
input grp $ string $;
a,"The 1st Incident was when 12345 (Mr. Dumpty) fell off the wall."
b,"The 2nd Incident was when 34567 (Ms. Muffet) fell off a stool."
c,"Has the 123 word Incident, but there are no numbers after 'Incident'."
proc print data=prxtest;
title 'What does the data look like';
length gotnum 8.;
retain lookfor ;
if _n_=1 then do;
** Create pattern with prxparse.;
lookfor = prxparse('/Incident/');
** Prxmatch returns the location in Arg2,;
** where ARG1 begins.;
** Note how prxmatch and index return the same number;
** Do you really need prxparse/prxmatch?;
** Will Index function work for your data?;
foundit = prxmatch(lookfor,string);
foundit2 = index(string,'Incident');
** If the pattern has been found;
** substring out everything AFTER;
** the word "Incident". Then, compress;
** out the punctuation and upper and lower case letters.;
** What should be left are the numbers after the string Incident.;
if foundit gt 0 then do;
gotnum = input((compress(substr(string,foundit+8),'.,;)','al')),8.0);
gotnum = .;
proc print data=checkdata;
title 'Found "Incident" Got number';