DATA Step, Macro, Functions and more

Extract Values using PERL Regular Expression

Reply
New Contributor
Posts: 3

Extract Values using PERL Regular Expression

I would like to extract serum creatinine values from these strings. My regular expression does not work. How can I fix it?

data _NULL_;    str1 = 'Serum creatinine: 3160'; * WANT: 3160;    str2 = 'Serum creatinine is 3160'; * WANT: 3160;    str3 = '72(ref range 44-106)'; *WANT: 72 and 44-106;    str4 = '133 H umol/l (49-93)'; *WANT: 133 and 49-93;    str5 = '80(Ref. Int. 52-112 umol/L)'; *WANT: 80 and 52-112;    str6 = 'TEST RESULT\.br\COLLECTION DATE 6-FEB-2014\.br\24 HOUR URINE VOLUME 0.100\.br\\.br\SERUM CREATININE LEVEL 511 HI 64 - 110'; *WANT: 511 and 64-110;    reg_ex = prxparse("/((?:.*?)(?=CREATININE)?(?:.*?))\s*(\d+)\s*(\d+\s*-\s*\d+)?/oi");    reg_match = prxmatch(reg_ex, str2);        paren = prxparen(reg_ex);    put paren=;  run;
Occasional Contributor
Posts: 13

Re: Extract Values using PERL Regular Expression

[ Edited ]
Posted in reply to verdantsphinx

I am not sure if this is possible to acoumplish with single regex in SAS, since you need something like conditional regular expression which is not supported in SAS perl regex (actually the /o flag for optimization is also not supported in SAS). So it is easiler to split the task into 2 parts.

 

 

data a;
	array str[*] $200 str1-str6;
	str1 = 'Serum creatinine: 3160';  * WANT: 3160;
	str2 = 'Serum creatinine is 3160';  * WANT: 3160;
	str3 = '72(ref range 44-106)';  *WANT: 72 and 44-106;
	str4 = '133 H umol/l (49-93)'; *WANT: 133 and 49-93;
	str5 = '80(Ref. Int. 52-112 umol/L)'; *WANT: 80 and 52-112;
	str6 = 'TEST                     RESULT\.br\COLLECTION DATE 6-FEB-2014\.br\24 HOUR URINE VOLUME 0.100\.br\\.br\SERUM CREATININE LEVEL   511      HI 64 - 110'; *WANT: 511 and 64-110;

	regex1 = prxparse('/(creatinine).*?(\d+)\D*(\d+\s*-\s*\d+)?/i');
	regex2 = prxparse('/(\d+)\D*(\d+\s*-\s*\d+)?/');

	length match $200;

	do i = 1 to 6;
		if prxmatch(regex1, str[i]) then do;
			match = catx(' and ', prxposn(regex1, 2, str[i]), prxposn(regex1, 3, str[i]));
			put 'Found match with creatinine word: ' match;
		end;
		else if prxmatch(regex2, str[i]) then do;
			match = catx(' and ', prxposn(regex2, 1, str[i]), prxposn(regex2, 2, str[i]));
			put 'Found match without creatinine word: ' match;
		end;
		output;
	end;
 run;

This works for your examples. See log and dataset for details.

 

Also, I suggest you to use something like regex101.com for composing regular expressions before using them in SAS, this will just save you time, since you will se results immidiately - for example this is for your case https://regex101.com/r/WIVWa8/1

 

Regular Contributor
Posts: 153

Re: Extract Values using PERL Regular Expression

[ Edited ]
Posted in reply to verdantsphinx

Hi,

 

first step, create a vertical structure of the strings

data aatest;
   length str1 $200;
   str1 = 'Serum creatinine: 3160';      *WANT: 3160;output;
   str1 = 'Serum creatinine is 3160';    *WANT: 3160;output;
   str1 = '72(ref range 44-106)';        *WANT: 72 and 44-106;output;
   str1 = '133 H umol/l (49-93)';        *WANT: 133 and 49-93;output;
   str1 = '80(Ref. Int. 52-112 umol/L)'; *WANT: 80 and 52-112;output;
   str1 = 'TEST                     RESULT\.br\COLLECTION DATE 6-FEB-2014\.br\24 HOUR URINE VOLUME 0.100\.br\\.br\SERUM CREATININE LEVEL   511      HI 64 - 110'; *WANT: 511 and 64-110;output;
run;

second, standardize the chaos Smiley Happy

data aaatest;
   length out $200;
   set aatest;

   regId=prxparse('s/^(?:.*creatinine\s*.[^\d]*\s*)?(\d+).[^\d]*((\d+)\s*(-)\s*(\d+))?.*/$1 $2/i');
   match=prxmatch(regId,str1);
   out=prxchange(regId,-1,str1);
   value=scan(out,1);
   range=compress(substr(out,find(out,' ')));
   if not match then put 'E' 'RROR: unrecognized string pattern, please check' str1=;
run;

tadaaa!

regexResult.png

________________________

- Cheers -

Ask a Question
Discussion stats
  • 2 replies
  • 156 views
  • 0 likes
  • 3 in conversation