This is the code I tried, it gets wrong result:
data want;
set check;
length v $ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+(\d+)?\.\d+\s+|\s+(\d+)?\.\d+\s+(\S+)?\s+(\S+)?\s+[^|]/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') );
drop pid position length;
run;
proc print data=want; run;
OK. Try this one :
data check; var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output; var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output; var="2.5h of ods patient reported seeing spots."; output; var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output; var="Headache."; output; var="ab ht .3 5 4 .5k Headache ab";output; run; data want; set check; length v $ 20; retain pid; var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var); if _n_ eq 1 then pid=prxparse('/[^|]\s+\S*\s+\S*\s+[a-z]+\d*\.\d+\s+|\s+\d*\.\d+[abe-z]+\s+\S*\s+\S*\s+[^|]/i'); call prxsubstr(pid, var, position, length); if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') ); drop pid position length; run;
Xia Keshan
Thanks. Just want to understand your code completely. I simply changed [abe-z] to [a-z], how come it doesn't work for the 4th record anymore?
Thanks!
data want;
set check;
length v $ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+\S*\s+\S*\s+[a-z]+\d*\.\d+\s+|\s+\d*\.\d+[a-z]+\s+\S*\s+\S*\s+[^|]/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') );
drop pid position length;
run;
Oh. I missed one thing .
data want;
set check;
length v $ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+\S*\s+\S*\s+[a-z]+\d*\.\d+\s+|\s+\d*\.\d+[a-z]+\s+\S*\s+\S*\s+[^|]/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') );
drop pid position length;
run;
But 0.6 returned for record 4 which shouldn't.
What should it return ?
Even though I love RegEx questions that's why I've stopped answering to this thread. I really believe the OP needs now to do his homework, analyze the data, fully define the extraction rules, provide sample data which cover all the cases and provide the expected result. Else this will go on and on and on....
I can't agree with you any more. OP is trying to dump so many hard rules to us .Try to stump us .
Xia Keshan
It has "ab ht" around the numbers, so should return nothing.
var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output;
OK.Try this one.
data check;
var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output;
var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output;
var="2.5h of ods patient reported seeing spots."; output;
var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output;
var="Headache."; output;
var="ab ht .3 5 4 .5k Headache ab";output;
run;
data want;
set check;
length v $ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+[^|]*\s+[^|]*\s+(\S*\d*\.\d+)\S*\s+[^|]*\s+[^|]*\s+[^|]/');
if prxmatch(pid,var) then v=compress(prxposn(pid,1,var),'.','kd');
drop pid ;
run;
May I know what your RegEx does here?
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+[^|]*\s+[^|]*\s+(\S*\d*\.\d+)\S*\s+[^|]*\s+[^|]*\s+[^|]/');
if prxmatch(pid,var) then v=compress(prxposn(pid,1,var),'.','kd');
data check;
var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;
var="a2 Pt. Rptd. Backache. usd25"; output;
var="b25h of patient reported seeing spots."; output;
var="3g Elevated pulse and 6d labored breathing."; output;
var="Headache."; output;
var="abc h3 4k Headache.";output;
run;
data want;
set check;
length v $ 20;
retain pid;
if _n_ eq 1 then pid=prxparse('/\b[c-z]+\d+\b|\b\d+[abe-z]+\b/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = compress(substr(var, position, length),,'kd');
drop pid position length;
run;
Xia Keshan
Thanks xia keshan. Please see my last post.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.