Hi,
Can someone help me with this? If I am getting the same data format as the example below. I'd like to pull the next word after matching the long string.
Example:
string = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.'
Output:
match1= 8/21/20
match2=9/21/20
Brute force:
data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;
data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;
Are you matching on keywords like "on" and "until"?
Brute force:
data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;
data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;
something like this?
data _null_;
ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
start = 1;
stop = length(text);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the start parameter so that searching */
/* begins again after the last match. */
call prxnext(ExpressionID, start, stop, text, position, length);
do while (position > 0);
match = substr(text, position, length);
put match=;
call prxnext(ExpressionID, start, stop, text, position, length);
end;
run;
After a few struggle, this is what I came up.
There should be much better code but this is the best I can offer.
data want;
ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
start = 1;
stop = length(text);
array match[2] $;
call prxnext(ExpressionID, start, stop, text, position, length);
i=0;
do while (position > 0);
i+1;
match[i] = substr(text, position, length);
put match[i]=;
call prxnext(ExpressionID, start, stop, text, position, length);
end;
keep match1 match2;
run;
You might be better of saving more of the 'pretext' (context previous to) of the date.
Example:
Find up to 5 'dates' in a slash date format and their pretexts.
data have; infile cards truncover; input text $char100.; datalines; 1/1/2020 opening The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again. Somebody opened on 20AUG2020, a day before me! Last call closing 8/22/2020 ; data want(keep=text date: pretext:); set have; array pretext(3) $100 ; array datestr(3) $10; rxid = prxparse ('/(\d{1,2}\/\d{1,2}\/\d{2,4})/'); start = 1; stop = -1; put /text=; do index = 1 to dim(datestr); prepos = start; call prxnext(rxid, start, stop, text, position, length); put prepos= start= stop= position= length=; if position > prepos then pretext(index) = substr(text, prepos, position-prepos); if position > 0 then datestr(index) = substr(text, position, length); else pretext(index) = substr(text, start); if length = 0 then leave; end; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.