Hi,
I've got a blood text file (as a string, column named TXT) in which I need to extract just the measure units the text I've got, i.e.. "K/UL","M/UL","%",etc. from the following:
WBC 4.27-11.40 k/uL 3.64 (L)
RBC 3.90-5.03 m/uL 4.30
Hemoglobin 10.6-13.4 g/dL 13.0
Hematocrit 32.2-39.8 % 36.1
MCV 74.4-87.6 fL 84.0
MCH 24.8-29.5 pG 30.2 (H)
I wrote this code :
data ds;
set data;
retain re_units;
if _N_=1 then do;re_units = prxparse("~\d+-\d[\d.]*\s*\K\S+~s");end;
if missing(re_units) then do; putlog "INVALID REGEX" ;end;
do i=1 to 10;
if prxmatch(re_units, TXT) then do; units = prxposn(re_units,i,TXT);end;
output;
end;
run;
Which always yielding "INVALID REGEX" at the log. But, while using RegEx simulator it yields no problem - see this. I don't know why this is happening.
Hi,
I think it's because \K is not supported in SAS
data have;
length txt $200;
txt='cd34 8.-9.. µg/m² 30.2 (?=)(/&%$§")';output;
txt='M a r s h ma llow 8-9 Kg/day something !+-*/ 45.0-5.4 x(/&%$§")';output;
txt='cd34+ 8-9 µg/m² 30.2 (?=)(/&%$§")';output;
txt='WBC 4.27-11.40 k/uL 3.64 (L)';output;
txt='RBC 3.90-5.03 m/uL 4.30 m/uL';output;
txt='Hemoglobin 10.6-13.4 g/dL 13.0';output;
txt='Hematocrit 32.2-39.8 % 36.1';output;
txt='MCV 74.4-87.6 fL 84.0';output;
txt='MCH 24.8-29.5 pG 30.2 (H)';output;
txt='MCHC 31.8-34.9 g/dL 36.0 (H)';output;
txt='RDW-CV 12.2-14.4 % 13.2';output;
txt='Platelet Count 150-400 k/uL 175';output;
txt='MPV 9.2-11.4 fL 8.6 (L)';output;
txt='Neut% 28.6-74.5 % 43.1';output;
txt='Abs Neut (ANC) 1.63-7.87 k/uL 1.57 (L)';output;
txt='Lymph% 15.5-57.8 % 43.7';output;
txt='Abs Lymph 0.97-4.28 k/uL 1.59';output;
txt='Mono% 4.2-12.3 % 9.3';output;
txt='Abs Mono 0.19-0.85 k/uL 0.34';output;
txt='Eosin% 0.0-4.7 % 3.6';output;
txt='Abs Eosin 0.00-0.52 k/uL 0.13';output;
txt='Baso% 0.0-0.7 % 0.3';output;
txt='Abs Baso 0.00-0.06 k/uL 0.01';output;
run;
data want;
set have;
unit=prxchange('s/(([^\s]+\s)+)\s{2,}([\d.-]+)\s+(\S+)\s+.*/$4/',-1,txt);
if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
put unit=;
run;
unit=µg/m²
unit=Kg/day
unit=µg/m²
unit=k/uL
unit=m/uL
unit=g/dL
unit=%
unit=fL
unit=pG
unit=g/dL
unit=%
unit=k/uL
unit=fL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
This works too:
data want;
set have;
unit=scan(scan(prxchange('s/\s{3,}/#/',-1,strip(txt)),2,'#'),2,' ');
if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
put unit=;
run;
- Cheers -
How about this one ?
data have; input txt $80.; cards; WBC 4.27-11.40 k/uL 3.64 (L) RBC 3.90-5.03 m/uL 4.30 Hemoglobin 10.6-13.4 g/dL 13.0 Hematocrit 32.2-39.8 % 36.1 MCV 74.4-87.6 fL 84.0 MCH 24.8-29.5 pG 30.2 (H) ; data want; set have; pid=prxparse('/\d+\.\d+\-\d+\.\d+\s+\S+/'); call prxsubstr(pid,txt,p,l); if p then want=scan(substr(txt,p,l),-1,' '); drop pid p l; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.