Hi,
Can someone help me with this? If I am getting the same data format as the example below. I'd like to pull the next word after matching the long string.
Example:
string = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.'
Output:
match1= 8/21/20
match2=9/21/20
Brute force:
data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;
data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;
Are you matching on keywords like "on" and "until"?
Brute force:
data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;
data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;
something like this?
data _null_;
ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
start = 1;
stop = length(text);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the start parameter so that searching */
/* begins again after the last match. */
call prxnext(ExpressionID, start, stop, text, position, length);
do while (position > 0);
match = substr(text, position, length);
put match=;
call prxnext(ExpressionID, start, stop, text, position, length);
end;
run;
After a few struggle, this is what I came up.
There should be much better code but this is the best I can offer.
data want;
ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
start = 1;
stop = length(text);
array match[2] $;
call prxnext(ExpressionID, start, stop, text, position, length);
i=0;
do while (position > 0);
i+1;
match[i] = substr(text, position, length);
put match[i]=;
call prxnext(ExpressionID, start, stop, text, position, length);
end;
keep match1 match2;
run;
You might be better of saving more of the 'pretext' (context previous to) of the date.
Example:
Find up to 5 'dates' in a slash date format and their pretexts.
data have; infile cards truncover; input text $char100.; datalines; 1/1/2020 opening The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again. Somebody opened on 20AUG2020, a day before me! Last call closing 8/22/2020 ; data want(keep=text date: pretext:); set have; array pretext(3) $100 ; array datestr(3) $10; rxid = prxparse ('/(\d{1,2}\/\d{1,2}\/\d{2,4})/'); start = 1; stop = -1; put /text=; do index = 1 to dim(datestr); prepos = start; call prxnext(rxid, start, stop, text, position, length); put prepos= start= stop= position= length=; if position > prepos then pretext(index) = substr(text, prepos, position-prepos); if position > 0 then datestr(index) = substr(text, position, length); else pretext(index) = substr(text, start); if length = 0 then leave; end; run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: