BookmarkSubscribeRSS Feed
Wouter
Obsidian | Level 7
All,

I've got a datastep in which I extract a number from a string. But... The strings are changing from time to time, but only the first part before the word "Incident" (sometime there appear some numbers, but I don't need those). So what I can do, is make 2 datasteps with the statements:

PATTERN = PRXPARSE("/^Incident/"); --> start from this word

PATTERN = PRXPARSE("/\d\d\d\d\d?/"); --> collect the desired number

But it must be able to combine these statements, I think? Saves a lot of time and space! 🙂 Thanks in advance!
5 REPLIES 5
Olivier
Pyrite | Level 9
Hi Wouter.
Are you looking for something like that ?
[pre]
DATA work.test (DROP = regExp) ;
INFILE CARDS DLM = ";" ;
INPUT text :$40. ;
RETAIN regExp ;
IF _N_=1 THEN regExp = PRXPARSE("/(I|i)ncident(\d+).?/") ;
IF PRXMATCH(regExp, text) THEN number = PRXPOSN(regExp, 2, text)+0 ;
CARDS ;
Incident124
No incident at all
Incident2 @ 12:00
IncidentABC
Incident3ABC
Incident 3ABC
;
RUN ;
[/pre]
Regards,
Olivier
Wouter
Obsidian | Level 7
Well, right now I'm using:

data test2;
set test1;
if _n_ = 1 then do;
PATTERN = PRXPARSE("/\d\d\d\d\d?/");

IF MISSING(PATTERN) THEN DO;
PUT "ERROR IN COMPILING REGULAR EXPRESSION";
STOP;
end;
end;
RETAIN PATTERN;
CALL PRXSUBSTR(PATTERN,test1,START,LENGTH);
IF START GT 0 THEN DO;
NUMBER = SUBSTR(test1,START,LENGTH);
NUMBER = COMPRESS(NUMBER," ");
OUTPUT;
END;
run;


Test1 contains data like:
Incident 43244
Incident 894232
43243 Incident 44322
Incident 23
988 Incident 4322


So what I need is only the number after the word "Incident". Unfortunately, I can't check your code right now. I assume the (I|i) part is to make a distintion between capital written "Incidents"? And you use the PRXMATCH function, that's the only way I think if it isn't possible to make 1 PRXPARSE statement in which you state 2 different cases (start from the word "Incident" with should be possible with the "^" option, and from there the "d's". I'll let you know, thanks!
Cynthia_sas
SAS Super FREQ
Hi,
This is a totally old school example, using INDEX and COMPRESS but I threw in PRXMATCH(see below) to compare to INDEX and both PRXMATCH and INDEX return the same results (if you compare FOUNDIT and FOUNDIT2). The COMPRESS/SUBSTR is not as elegant as the other solution but it does the job.
cynthia

[pre]
data prxtest;
length grp $1 string $100;
infile datalines dsd dlm=',';
input grp $ string $;
return;
datalines;
a,"The 1st Incident was when 12345 (Mr. Dumpty) fell off the wall."
b,"The 2nd Incident was when 34567 (Ms. Muffet) fell off a stool."
c,"Has the 123 word Incident, but there are no numbers after 'Incident'."
;
run;

proc print data=prxtest;
title 'What does the data look like';
run;

data checkdata;
length gotnum 8.;
set prxtest;
retain lookfor ;

if _n_=1 then do;
** Create pattern with prxparse.;
lookfor = prxparse('/Incident/');
end;

** Prxmatch returns the location in Arg2,;
** where ARG1 begins.;
** Note how prxmatch and index return the same number;
** Do you really need prxparse/prxmatch?;
** Will Index function work for your data?;
foundit = prxmatch(lookfor,string);
foundit2 = index(string,'Incident');

** If the pattern has been found;
** substring out everything AFTER;
** the word "Incident". Then, compress;
** out the punctuation and upper and lower case letters.;
** What should be left are the numbers after the string Incident.;
if foundit gt 0 then do;
gotnum = input((compress(substr(string,foundit+8),'.,;:()','al')),8.0);
end;
else do;
gotnum = .;
end;
run;


proc print data=checkdata;
title 'Found "Incident" Got number';
run;
[/pre]
Wouter
Obsidian | Level 7
Hi Cynthia,

The strange thing is, with this code, the result is always 1 when there's no number before the word "incident", and otherwise I get 2 numbers which I couldn't relate to the numbers before "incident".
Wouter
Obsidian | Level 7
Yes Olivier, thanks!!

I've changed the statement a little bit (because there's a space between the actual number and "Incident", but it works perfectly!!

Right now, it is:

DATA work.test2 ;
set test;
RETAIN regExp ;
IF _N_=1 THEN regExp = PRXPARSE("/(I|i)ncident\s(\d+).?/") ;
IF PRXMATCH(regExp, text) THEN number = PRXPOSN(regExp, 2,text) ;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 672 views
  • 0 likes
  • 3 in conversation