Hi,
Can someone help me with this? If I am getting the same data format as the example below. I'd like to pull the next word after matching the long string.
Example:
string = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.'
Output:
match1= 8/21/20
match2=9/21/20
Brute force:
data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;
data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;
Are you matching on keywords like "on" and "until"?
Brute force:
data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;
data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;
something like this?
data _null_;
ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
start = 1;
stop = length(text);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the start parameter so that searching */
/* begins again after the last match. */
call prxnext(ExpressionID, start, stop, text, position, length);
do while (position > 0);
match = substr(text, position, length);
put match=;
call prxnext(ExpressionID, start, stop, text, position, length);
end;
run;
After a few struggle, this is what I came up.
There should be much better code but this is the best I can offer.
data want;
ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
start = 1;
stop = length(text);
array match[2] $;
call prxnext(ExpressionID, start, stop, text, position, length);
i=0;
do while (position > 0);
i+1;
match[i] = substr(text, position, length);
put match[i]=;
call prxnext(ExpressionID, start, stop, text, position, length);
end;
keep match1 match2;
run;
You might be better of saving more of the 'pretext' (context previous to) of the date.
Example:
Find up to 5 'dates' in a slash date format and their pretexts.
data have;
infile cards truncover;
input text $char100.;
datalines;
1/1/2020 opening
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
Somebody opened on 20AUG2020, a day before me!
Last call
closing 8/22/2020
;
data want(keep=text date: pretext:);
set have;
array pretext(3) $100 ;
array datestr(3) $10;
rxid = prxparse ('/(\d{1,2}\/\d{1,2}\/\d{2,4})/');
start = 1;
stop = -1;
put /text=;
do index = 1 to dim(datestr);
prepos = start;
call prxnext(rxid, start, stop, text, position, length);
put prepos= start= stop= position= length=;
if position > prepos then
pretext(index) = substr(text, prepos, position-prepos);
if position > 0 then
datestr(index) = substr(text, position, length);
else
pretext(index) = substr(text, start);
if length = 0 then leave;
end;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.