BookmarkSubscribeRSS Feed
shakednav
Fluorite | Level 6

Hi,

 

I've got a blood text file (as a string, column named TXT) in which I need to extract just the measure units the text I've got, i.e.. "K/UL","M/UL","%",etc. from the following:

 

WBC                            4.27-11.40 k/uL                        3.64 (L)
RBC                            3.90-5.03 m/uL                         4.30
Hemoglobin                     10.6-13.4 g/dL                         13.0
Hematocrit                     32.2-39.8 %                            36.1
MCV                            74.4-87.6 fL                           84.0
MCH                            24.8-29.5 pG                           30.2 (H)

I wrote this code :

data ds;
	set data;
	retain re_units;
	if _N_=1 then do;re_units = prxparse("~\d+-\d[\d.]*\s*\K\S+~s");end;
	if missing(re_units) then do; putlog "INVALID REGEX" ;end;
    do i=1 to 10;
	    if prxmatch(re_units, TXT) then do; units = prxposn(re_units,i,TXT);end;
		output;
	end;
run;

Which always yielding "INVALID REGEX" at the log. But, while using RegEx simulator it yields no problem - see this. I don't know why this is happening. 

7 REPLIES 7
Ksharp
Super User
Can't SCAN() get it ?

units = scan(txt,-1,' ');
shakednav
Fluorite | Level 6
Nope. Do not forget that this text is a long string (for each row).
Oligolas
Barite | Level 11

Hi,

I think it's because \K is not supported in SAS

Try this:

 

data have;
length txt $200;
txt='cd34     8.-9.. µg/m²    30.2 (?=)(/&%$§")';output;
txt='M a r s h ma llow           8-9 Kg/day    something     !+-*/    45.0-5.4 x(/&%$§")';output;
txt='cd34+           8-9 µg/m²    30.2 (?=)(/&%$§")';output;
txt='WBC                            4.27-11.40 k/uL                        3.64 (L)';output;
txt='RBC                            3.90-5.03 m/uL                         4.30 m/uL';output;
txt='Hemoglobin                     10.6-13.4 g/dL                         13.0';output;
txt='Hematocrit                     32.2-39.8 %                            36.1';output;
txt='MCV                            74.4-87.6 fL                           84.0';output;
txt='MCH                            24.8-29.5 pG                           30.2 (H)';output;
txt='MCHC                           31.8-34.9 g/dL                         36.0 (H)';output;
txt='RDW-CV                         12.2-14.4 %                            13.2';output;
txt='Platelet Count                 150-400 k/uL                           175';output;
txt='MPV                            9.2-11.4 fL                            8.6 (L)';output;
txt='Neut%                          28.6-74.5 %                            43.1';output;
txt='Abs Neut (ANC)                 1.63-7.87 k/uL                         1.57 (L)';output;
txt='Lymph%                         15.5-57.8 %                            43.7';output;
txt='Abs Lymph                      0.97-4.28 k/uL                         1.59';output;
txt='Mono%                          4.2-12.3 %                             9.3';output;
txt='Abs Mono                       0.19-0.85 k/uL                         0.34';output;
txt='Eosin%                         0.0-4.7 %                              3.6';output;
txt='Abs Eosin                      0.00-0.52 k/uL                         0.13';output;
txt='Baso%                          0.0-0.7 %                              0.3';output;
txt='Abs Baso                       0.00-0.06 k/uL                         0.01';output;
run;

data want;
   set have;
   unit=prxchange('s/(([^\s]+\s)+)\s{2,}([\d.-]+)\s+(\S+)\s+.*/$4/',-1,txt);
   if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
   put unit=;
run;

unit=µg/m²
unit=Kg/day
unit=µg/m²
unit=k/uL
unit=m/uL
unit=g/dL
unit=%
unit=fL
unit=pG
unit=g/dL
unit=%
unit=k/uL
unit=fL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL

 

This works too:

data want;
   set have;
   unit=scan(scan(prxchange('s/\s{3,}/#/',-1,strip(txt)),2,'#'),2,' ');
   if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
   put unit=;
run;
________________________

- Cheers -

Ksharp
Super User

How about this one ?

 

data have;
input txt $80.;
cards;
WBC                            4.27-11.40 k/uL                        3.64 (L)
RBC                            3.90-5.03 m/uL                         4.30
Hemoglobin                     10.6-13.4 g/dL                         13.0
Hematocrit                     32.2-39.8 %                            36.1
MCV                            74.4-87.6 fL                           84.0
MCH                            24.8-29.5 pG                           30.2 (H)
;
data want;
 set have;
 pid=prxparse('/\d+\.\d+\-\d+\.\d+\s+\S+/');
 call prxsubstr(pid,txt,p,l);
 if p then want=scan(substr(txt,p,l),-1,' ');
 drop pid p l;
run;
shakednav
Fluorite | Level 6
oh cool, but it yields only the very first unit ("k/ul").
How can iterate through the entire matches?
shakednav
Fluorite | Level 6
Please do not forget that the whole blood test is in a single column named TXT (with no line breaks)
Ksharp
Super User
Post more data ,so I could test the code .

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1722 views
  • 3 likes
  • 3 in conversation