Hi,
I've got a blood text file (as a string, column named TXT) in which I need to extract just the measure units the text I've got, i.e.. "K/UL","M/UL","%",etc. from the following:
WBC 4.27-11.40 k/uL 3.64 (L)
RBC 3.90-5.03 m/uL 4.30
Hemoglobin 10.6-13.4 g/dL 13.0
Hematocrit 32.2-39.8 % 36.1
MCV 74.4-87.6 fL 84.0
MCH 24.8-29.5 pG 30.2 (H)
I wrote this code :
data ds;
set data;
retain re_units;
if _N_=1 then do;re_units = prxparse("~\d+-\d[\d.]*\s*\K\S+~s");end;
if missing(re_units) then do; putlog "INVALID REGEX" ;end;
do i=1 to 10;
if prxmatch(re_units, TXT) then do; units = prxposn(re_units,i,TXT);end;
output;
end;
run;
Which always yielding "INVALID REGEX" at the log. But, while using RegEx simulator it yields no problem - see this. I don't know why this is happening.
Hi,
I think it's because \K is not supported in SAS
data have;
length txt $200;
txt='cd34 8.-9.. µg/m² 30.2 (?=)(/&%$§")';output;
txt='M a r s h ma llow 8-9 Kg/day something !+-*/ 45.0-5.4 x(/&%$§")';output;
txt='cd34+ 8-9 µg/m² 30.2 (?=)(/&%$§")';output;
txt='WBC 4.27-11.40 k/uL 3.64 (L)';output;
txt='RBC 3.90-5.03 m/uL 4.30 m/uL';output;
txt='Hemoglobin 10.6-13.4 g/dL 13.0';output;
txt='Hematocrit 32.2-39.8 % 36.1';output;
txt='MCV 74.4-87.6 fL 84.0';output;
txt='MCH 24.8-29.5 pG 30.2 (H)';output;
txt='MCHC 31.8-34.9 g/dL 36.0 (H)';output;
txt='RDW-CV 12.2-14.4 % 13.2';output;
txt='Platelet Count 150-400 k/uL 175';output;
txt='MPV 9.2-11.4 fL 8.6 (L)';output;
txt='Neut% 28.6-74.5 % 43.1';output;
txt='Abs Neut (ANC) 1.63-7.87 k/uL 1.57 (L)';output;
txt='Lymph% 15.5-57.8 % 43.7';output;
txt='Abs Lymph 0.97-4.28 k/uL 1.59';output;
txt='Mono% 4.2-12.3 % 9.3';output;
txt='Abs Mono 0.19-0.85 k/uL 0.34';output;
txt='Eosin% 0.0-4.7 % 3.6';output;
txt='Abs Eosin 0.00-0.52 k/uL 0.13';output;
txt='Baso% 0.0-0.7 % 0.3';output;
txt='Abs Baso 0.00-0.06 k/uL 0.01';output;
run;
data want;
set have;
unit=prxchange('s/(([^\s]+\s)+)\s{2,}([\d.-]+)\s+(\S+)\s+.*/$4/',-1,txt);
if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
put unit=;
run;
unit=µg/m²
unit=Kg/day
unit=µg/m²
unit=k/uL
unit=m/uL
unit=g/dL
unit=%
unit=fL
unit=pG
unit=g/dL
unit=%
unit=k/uL
unit=fL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
This works too:
data want;
set have;
unit=scan(scan(prxchange('s/\s{3,}/#/',-1,strip(txt)),2,'#'),2,' ');
if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
put unit=;
run;
- Cheers -
How about this one ?
data have; input txt $80.; cards; WBC 4.27-11.40 k/uL 3.64 (L) RBC 3.90-5.03 m/uL 4.30 Hemoglobin 10.6-13.4 g/dL 13.0 Hematocrit 32.2-39.8 % 36.1 MCV 74.4-87.6 fL 84.0 MCH 24.8-29.5 pG 30.2 (H) ; data want; set have; pid=prxparse('/\d+\.\d+\-\d+\.\d+\s+\S+/'); call prxsubstr(pid,txt,p,l); if p then want=scan(substr(txt,p,l),-1,' '); drop pid p l; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.