Hi,
I've got a blood text file (as a string, column named TXT) in which I need to extract just the measure units the text I've got, i.e.. "K/UL","M/UL","%",etc. from the following:
WBC                            4.27-11.40 k/uL                        3.64 (L)
RBC                            3.90-5.03 m/uL                         4.30
Hemoglobin                     10.6-13.4 g/dL                         13.0
Hematocrit                     32.2-39.8 %                            36.1
MCV                            74.4-87.6 fL                           84.0
MCH                            24.8-29.5 pG                           30.2 (H)I wrote this code :
data ds;
	set data;
	retain re_units;
	if _N_=1 then do;re_units = prxparse("~\d+-\d[\d.]*\s*\K\S+~s");end;
	if missing(re_units) then do; putlog "INVALID REGEX" ;end;
    do i=1 to 10;
	    if prxmatch(re_units, TXT) then do; units = prxposn(re_units,i,TXT);end;
		output;
	end;
run;Which always yielding "INVALID REGEX" at the log. But, while using RegEx simulator it yields no problem - see this. I don't know why this is happening.
Hi,
I think it's because \K is not supported in SAS
data have;
length txt $200;
txt='cd34     8.-9.. µg/m²    30.2 (?=)(/&%$§")';output;
txt='M a r s h ma llow           8-9 Kg/day    something     !+-*/    45.0-5.4 x(/&%$§")';output;
txt='cd34+           8-9 µg/m²    30.2 (?=)(/&%$§")';output;
txt='WBC                            4.27-11.40 k/uL                        3.64 (L)';output;
txt='RBC                            3.90-5.03 m/uL                         4.30 m/uL';output;
txt='Hemoglobin                     10.6-13.4 g/dL                         13.0';output;
txt='Hematocrit                     32.2-39.8 %                            36.1';output;
txt='MCV                            74.4-87.6 fL                           84.0';output;
txt='MCH                            24.8-29.5 pG                           30.2 (H)';output;
txt='MCHC                           31.8-34.9 g/dL                         36.0 (H)';output;
txt='RDW-CV                         12.2-14.4 %                            13.2';output;
txt='Platelet Count                 150-400 k/uL                           175';output;
txt='MPV                            9.2-11.4 fL                            8.6 (L)';output;
txt='Neut%                          28.6-74.5 %                            43.1';output;
txt='Abs Neut (ANC)                 1.63-7.87 k/uL                         1.57 (L)';output;
txt='Lymph%                         15.5-57.8 %                            43.7';output;
txt='Abs Lymph                      0.97-4.28 k/uL                         1.59';output;
txt='Mono%                          4.2-12.3 %                             9.3';output;
txt='Abs Mono                       0.19-0.85 k/uL                         0.34';output;
txt='Eosin%                         0.0-4.7 %                              3.6';output;
txt='Abs Eosin                      0.00-0.52 k/uL                         0.13';output;
txt='Baso%                          0.0-0.7 %                              0.3';output;
txt='Abs Baso                       0.00-0.06 k/uL                         0.01';output;
run;
data want;
   set have;
   unit=prxchange('s/(([^\s]+\s)+)\s{2,}([\d.-]+)\s+(\S+)\s+.*/$4/',-1,txt);
   if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
   put unit=;
run;
unit=µg/m²
unit=Kg/day
unit=µg/m²
unit=k/uL
unit=m/uL
unit=g/dL
unit=%
unit=fL
unit=pG
unit=g/dL
unit=%
unit=k/uL
unit=fL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
unit=%
unit=k/uL
This works too:
data want;
   set have;
   unit=scan(scan(prxchange('s/\s{3,}/#/',-1,strip(txt)),2,'#'),2,' ');
   if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
   put unit=;
run;
					
				
			
			
				- Cheers -
How about this one ?
data have;
input txt $80.;
cards;
WBC                            4.27-11.40 k/uL                        3.64 (L)
RBC                            3.90-5.03 m/uL                         4.30
Hemoglobin                     10.6-13.4 g/dL                         13.0
Hematocrit                     32.2-39.8 %                            36.1
MCV                            74.4-87.6 fL                           84.0
MCH                            24.8-29.5 pG                           30.2 (H)
;
data want;
 set have;
 pid=prxparse('/\d+\.\d+\-\d+\.\d+\s+\S+/');
 call prxsubstr(pid,txt,p,l);
 if p then want=scan(substr(txt,p,l),-1,' ');
 drop pid p l;
run;
					
				
			
			
				
			
			
			
			
			
			
			
		It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.