Is below what you're after?
data check;
var="Patient reported 100 c headache 3f and nausea. MD noticed rash.";
output;
var="a2 Pt. Rptd. Backache. usd25";
output;
var="b25h of patient reported seeing spots.";
output;
var="3g Elevated pulse and 6d labored breathing.";
output;
var="Headache.";
output;
run;
data want;
if _n_=1 then _rx=prxparse('/(^|[^ab\d])(\d+)([^cd\d]|$)/oi');
retain _rx;
set check;
length string $20;
_check=var;
_check=compress(_check);
if prxmatch(_rx,_check) then
do;
string=prxposn(_rx,2,_check);
end;
run;
Thank you, Patrick!
Sorry, just one more scenario, for cases like this one below:
var="abc h3 4k Headache.";
There are 2 valid numbers, I'm ok with if 3 is returned, but not 34, any way to fix?
Thanks!
Not really pretty but best I could come up with:
data want;
if _n_=1 then _rx=prxparse('/(^|[^ab\d])(\d+)([^cd\d]|$)/oi');
retain _rx;
set check;
length string $20;
_check=var;
_check=prxchange('s/(\d) +(\d)/\1\|\2)/o',-1,_check);
_check=compress(_check);
if prxmatch(_rx,_check) then
do;
string=prxposn(_rx,2,_check);
end;
run;
Hi Patrick,
Thank you so much! It’s working mostly, but still something is missing. I have to change my sample to suit better of my real data:
data check;
var="Patient reported .1 c headache 1.3f and nausea. MD noticed rash."; output;
var="abc2.2 Pt. Rptd. Backache. usd2.5"; output;
var="efg2.5h of patient reported seeing spots."; output;
var="1.3g Elevated pulse and 0.6d labored breathing."; output;
var="Headache."; output;
var="abc .3 5 4.5k Headache.";output;
run;
Expected Output:
var string
Patient reported .1 c headache 1.3f and nausea. MD noticed rash. | 1.3 | Patientreported.1cheadache1.3fandnausea.MDnoticedrash. | ||
abc2.2 Pt. Rptd. Backache. usd2.5 | 2.5 | abc2.2Pt.Rptd.Backache.usd2.5 | ||
efg2.5h of patient reported seeing spots. | efg2.5hofpatientreportedseeingspots. | |||
1.3g Elevated pulse and 0.6d labored breathing. | 1.3 | 1.3gElevatedpulseand0.6dlaboredbreathing. | ||
Headache. | Headache. | |||
abc .3 5 4.5k Headache. | 4.5 | abc.3\|54.5kHeadache. |
data check; var="Patient reported .1 c headache 1.3f and nausea. MD noticed rash."; output; var="abc2.2 Pt. Rptd. Backache. usd2.5"; output; var="efg2.5h of patient reported seeing spots."; output; var="1.3g Elevated pulse and 0.6d labored breathing."; output; var="Headache."; output; var="abc .3 5 4.5k Headache.";output; run; run; data want; set check; length v $ 20; retain pid; if _n_ eq 1 then pid=prxparse('/\b[dh-z]+\d+\.?\d+\b|\b\d+\.?\d+[abe-z]+\b/i'); call prxsubstr(pid, var, position, length); if position ne 0 then v = compress(substr(var, position, length),'.','kd'); drop pid position length; run;
Xia Keshan
Thank you guys very much for your help!
I don’t want to start a new discussion, since it is another question regarding regular expression, wonder if it’s even possible though. Is there a way to identify the word if it’s within let’s say 3-words of distance? I’m going to use the same sample (modified), so the requirement become this:
data check;
var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output;
var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output;
var="2.5h of ods patient reported seeing spots."; output;
var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output;
var="Headache."; output;
var="ab ht .3 5 4 .5k Headache ab";output;
run;
Expected Output:
var | string |
Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash. | 1.3 |
ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od | 2.5 |
2.5h of ods patient reported seeing spots. | |
1.3g Elevated pulse ab ht and 0.6d labored breathing. | |
Headache. | |
ab ht .3 5 4 .5k Headache ab | .5 |
Thanks!
1) If your looking for stuff like '.2', '3.2' but not '3' then your RegEx looks fine to me.
2) What is the delimiter for a word? A specific pattern within 3 word distance could be something like (untested): (\bods\b)(\b\w+){0,2}(\d*\.\d+)
Thank you Patrick. I'll try that too.
You really make me headache . Hope you don't post such question again .
data check; var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output; var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output; var="2.5h of ods patient reported seeing spots."; output; var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output; var="Headache."; output; var="ab ht .3 5 4 .5k Headache ab";output; run; data want; set check; length v $ 20; retain pid; var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var); if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+[dh-z]+(\d+)?\.?\d+\s+|\s+(\d+)?\.?\d+[abe-z]+\s+(\S+)?\s+(\S+)?\s+[^|]/i'); call prxsubstr(pid, var, position, length); if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') ); drop pid position length; run;
Xia Keshan
Xia Keshan,
Thank you so much! That’s awesome!
One question, I believe you carried over [dh-z]+ and [abe-z]+ from my first question, right? So I tried to remove them as the code below, I don’t understand why it’s not working, what did I do wrong?
data want;
set check;
length v $ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+(\d+)?\.?\d+\s+|\s+(\d+)?\.?\d+\s+(\S+)?\s+(\S+)?\s+[^|]/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') );
drop pid position length;
run;
Why do you want to remove that line ? What is your purpose ?
If you don't want that condition any more , try this one :
data check; var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output; var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output; var="2.5h of ods patient reported seeing spots."; output; var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output; var="Headache."; output; var="ab ht .3 5 4 .5k Headache ab";output; run; data want; set check; length v $ 20; retain pid; if _n_ eq 1 then pid=prxparse('/\s+[dh-z]+(\d+)?\.?\d+\s+|\s+(\d+)?\.?\d+[abe-z]+\s+/i'); call prxsubstr(pid, var, position, length); if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') ); drop pid position length; run;
Xia Keshan
Message was edited by: xia keshan
The requirement was this:
There is no [dh-z] or [abe-z], I think your previous code is right as long as these are removed, but I can't get it work.
I don't know why it couldn't work.
Try to Remove '?' behind '.'
if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+[dh-z]+(\d+)?\.?\d+\s+|\s+(\d+)?\.?\d+[abe-z]+\s+(\S+)?\s+(\S+)?\s+[^|]/i');
data want;
set check;
length v $ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+[dh-z]+(\d+)?\.\d+\s+|\s+(\d+)?\.\d+[abe-z]+\s+(\S+)?\s+(\S+)?\s+[^|]/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') );
drop pid position length;
run;
Hi xia keshan,
Sorry, am I misunderstanding [dh-z], is it for (not started with ‘abc’, ‘efg’) from the previous example? But it’s not the request for this task, [abe-z] as well, so I want to remove them from the code you provided, but can’t make it work.
The only 2 requests for this task are:
Thanks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.