<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic RegEx in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689894#M209773</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've got a blood text file (as a string, column named TXT) in which I&lt;SPAN&gt;&amp;nbsp;need to extract just the measure units the text I've got, i.e.. "K/UL","M/UL","%",etc. from the following:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE class="default s-code-block hljs yaml"&gt;&lt;CODE&gt;&lt;SPAN class="hljs-string"&gt;WBC&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;4.27&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-11.40&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;k/uL&lt;/SPAN&gt;                        &lt;SPAN class="hljs-number"&gt;3.64&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;(L)&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;RBC&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;3.90&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-5.03&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;m/uL&lt;/SPAN&gt;                         &lt;SPAN class="hljs-number"&gt;4.30&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;Hemoglobin&lt;/SPAN&gt;                     &lt;SPAN class="hljs-number"&gt;10.6&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-13.4&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;g/dL&lt;/SPAN&gt;                         &lt;SPAN class="hljs-number"&gt;13.0&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;Hematocrit&lt;/SPAN&gt;                     &lt;SPAN class="hljs-number"&gt;32.2&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-39.8&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;%&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;36.1&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;MCV&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;74.4&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-87.6&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;fL&lt;/SPAN&gt;                           &lt;SPAN class="hljs-number"&gt;84.0&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;MCH&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;24.8&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-29.5&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;pG&lt;/SPAN&gt;                           &lt;SPAN class="hljs-number"&gt;30.2&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;(H)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I wrote this code :&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ds;
	set data;
	retain re_units;
	if _N_=1 then do;re_units = prxparse("~\d+-\d[\d.]*\s*\K\S+~s");end;
	if missing(re_units) then do; putlog "INVALID REGEX" ;end;
    do i=1 to 10;
	    if prxmatch(re_units, TXT) then do; units = prxposn(re_units,i,TXT);end;
		output;
	end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Which always yielding "INVALID REGEX" at the log. But, while using RegEx simulator it yields no problem - &lt;A href="https://regex101.com/r/2u0cpP/1" target="_self"&gt;see this&lt;/A&gt;. I don't know why this is happening.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 08 Oct 2020 08:11:45 GMT</pubDate>
    <dc:creator>shakednav</dc:creator>
    <dc:date>2020-10-08T08:11:45Z</dc:date>
    <item>
      <title>RegEx</title>
      <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689894#M209773</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've got a blood text file (as a string, column named TXT) in which I&lt;SPAN&gt;&amp;nbsp;need to extract just the measure units the text I've got, i.e.. "K/UL","M/UL","%",etc. from the following:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE class="default s-code-block hljs yaml"&gt;&lt;CODE&gt;&lt;SPAN class="hljs-string"&gt;WBC&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;4.27&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-11.40&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;k/uL&lt;/SPAN&gt;                        &lt;SPAN class="hljs-number"&gt;3.64&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;(L)&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;RBC&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;3.90&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-5.03&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;m/uL&lt;/SPAN&gt;                         &lt;SPAN class="hljs-number"&gt;4.30&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;Hemoglobin&lt;/SPAN&gt;                     &lt;SPAN class="hljs-number"&gt;10.6&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-13.4&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;g/dL&lt;/SPAN&gt;                         &lt;SPAN class="hljs-number"&gt;13.0&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;Hematocrit&lt;/SPAN&gt;                     &lt;SPAN class="hljs-number"&gt;32.2&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-39.8&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;%&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;36.1&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;MCV&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;74.4&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-87.6&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;fL&lt;/SPAN&gt;                           &lt;SPAN class="hljs-number"&gt;84.0&lt;/SPAN&gt;
&lt;SPAN class="hljs-string"&gt;MCH&lt;/SPAN&gt;                            &lt;SPAN class="hljs-number"&gt;24.8&lt;/SPAN&gt;&lt;SPAN class="hljs-number"&gt;-29.5&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;pG&lt;/SPAN&gt;                           &lt;SPAN class="hljs-number"&gt;30.2&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;(H)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I wrote this code :&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ds;
	set data;
	retain re_units;
	if _N_=1 then do;re_units = prxparse("~\d+-\d[\d.]*\s*\K\S+~s");end;
	if missing(re_units) then do; putlog "INVALID REGEX" ;end;
    do i=1 to 10;
	    if prxmatch(re_units, TXT) then do; units = prxposn(re_units,i,TXT);end;
		output;
	end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Which always yielding "INVALID REGEX" at the log. But, while using RegEx simulator it yields no problem - &lt;A href="https://regex101.com/r/2u0cpP/1" target="_self"&gt;see this&lt;/A&gt;. I don't know why this is happening.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Oct 2020 08:11:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689894#M209773</guid>
      <dc:creator>shakednav</dc:creator>
      <dc:date>2020-10-08T08:11:45Z</dc:date>
    </item>
    <item>
      <title>Re: RegEx</title>
      <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689935#M209799</link>
      <description>Can't SCAN() get it ?&lt;BR /&gt;&lt;BR /&gt;units = scan(txt,-1,' ');</description>
      <pubDate>Thu, 08 Oct 2020 11:41:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689935#M209799</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-10-08T11:41:37Z</dc:date>
    </item>
    <item>
      <title>Re: RegEx</title>
      <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689945#M209802</link>
      <description>Nope. Do not forget that this text is a long string (for each row).</description>
      <pubDate>Thu, 08 Oct 2020 12:11:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689945#M209802</guid>
      <dc:creator>shakednav</dc:creator>
      <dc:date>2020-10-08T12:11:31Z</dc:date>
    </item>
    <item>
      <title>Re: RegEx</title>
      <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689954#M209806</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I think it's because \K is not &lt;A href="https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf" target="_blank" rel="noopener"&gt;supported&lt;/A&gt; in SAS&lt;/P&gt;
&lt;P&gt;&lt;A href="https://regex101.com/r/2u0cpP/2" target="_blank" rel="noopener"&gt;Try this:&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
length txt $200;
txt='cd34     8.-9.. µg/m²    30.2 (?=)(/&amp;amp;%$§")';output;
txt='M a r s h ma llow           8-9 Kg/day    something     !+-*/    45.0-5.4 x(/&amp;amp;%$§")';output;
txt='cd34+           8-9 µg/m²    30.2 (?=)(/&amp;amp;%$§")';output;
txt='WBC                            4.27-11.40 k/uL                        3.64 (L)';output;
txt='RBC                            3.90-5.03 m/uL                         4.30 m/uL';output;
txt='Hemoglobin                     10.6-13.4 g/dL                         13.0';output;
txt='Hematocrit                     32.2-39.8 %                            36.1';output;
txt='MCV                            74.4-87.6 fL                           84.0';output;
txt='MCH                            24.8-29.5 pG                           30.2 (H)';output;
txt='MCHC                           31.8-34.9 g/dL                         36.0 (H)';output;
txt='RDW-CV                         12.2-14.4 %                            13.2';output;
txt='Platelet Count                 150-400 k/uL                           175';output;
txt='MPV                            9.2-11.4 fL                            8.6 (L)';output;
txt='Neut%                          28.6-74.5 %                            43.1';output;
txt='Abs Neut (ANC)                 1.63-7.87 k/uL                         1.57 (L)';output;
txt='Lymph%                         15.5-57.8 %                            43.7';output;
txt='Abs Lymph                      0.97-4.28 k/uL                         1.59';output;
txt='Mono%                          4.2-12.3 %                             9.3';output;
txt='Abs Mono                       0.19-0.85 k/uL                         0.34';output;
txt='Eosin%                         0.0-4.7 %                              3.6';output;
txt='Abs Eosin                      0.00-0.52 k/uL                         0.13';output;
txt='Baso%                          0.0-0.7 %                              0.3';output;
txt='Abs Baso                       0.00-0.06 k/uL                         0.01';output;
run;

data want;
   set have;
   unit=prxchange('s/(([^\s]+\s)+)\s{2,}([\d.-]+)\s+(\S+)\s+.*/$4/',-1,txt);
   if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
   put unit=;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;CODE class=" language-sas"&gt;&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;unit=µg/m²&lt;BR /&gt;unit=Kg/day&lt;BR /&gt;unit=µg/m²&lt;BR /&gt;unit=k/uL&lt;BR /&gt;unit=m/uL&lt;BR /&gt;unit=g/dL&lt;BR /&gt;unit=%&lt;BR /&gt;unit=fL&lt;BR /&gt;unit=pG&lt;BR /&gt;unit=g/dL&lt;BR /&gt;unit=%&lt;BR /&gt;unit=k/uL&lt;BR /&gt;unit=fL&lt;BR /&gt;unit=%&lt;BR /&gt;unit=k/uL&lt;BR /&gt;unit=%&lt;BR /&gt;unit=k/uL&lt;BR /&gt;unit=%&lt;BR /&gt;unit=k/uL&lt;BR /&gt;unit=%&lt;BR /&gt;unit=k/uL&lt;BR /&gt;unit=%&lt;BR /&gt;unit=k/uL&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This works too:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   set have;
   unit=scan(scan(prxchange('s/\s{3,}/#/',-1,strip(txt)),2,'#'),2,' ');
   if unit eq txt then put 'E' "RROR:# Smthg's wrong " _N_= unit=;
   put unit=;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 08 Oct 2020 14:31:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689954#M209806</guid>
      <dc:creator>Oligolas</dc:creator>
      <dc:date>2020-10-08T14:31:26Z</dc:date>
    </item>
    <item>
      <title>Re: RegEx</title>
      <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689960#M209809</link>
      <description>&lt;P&gt;How about this one ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data have;
input txt $80.;
cards;
WBC                            4.27-11.40 k/uL                        3.64 (L)
RBC                            3.90-5.03 m/uL                         4.30
Hemoglobin                     10.6-13.4 g/dL                         13.0
Hematocrit                     32.2-39.8 %                            36.1
MCV                            74.4-87.6 fL                           84.0
MCH                            24.8-29.5 pG                           30.2 (H)
;
data want;
 set have;
 pid=prxparse('/\d+\.\d+\-\d+\.\d+\s+\S+/');
 call prxsubstr(pid,txt,p,l);
 if p then want=scan(substr(txt,p,l),-1,' ');
 drop pid p l;
run;&lt;/PRE&gt;</description>
      <pubDate>Thu, 08 Oct 2020 13:24:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689960#M209809</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-10-08T13:24:18Z</dc:date>
    </item>
    <item>
      <title>Re: RegEx</title>
      <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689972#M209813</link>
      <description>oh cool, but it yields only the very first unit ("k/ul").&lt;BR /&gt;How can iterate through the entire matches?</description>
      <pubDate>Thu, 08 Oct 2020 13:33:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689972#M209813</guid>
      <dc:creator>shakednav</dc:creator>
      <dc:date>2020-10-08T13:33:28Z</dc:date>
    </item>
    <item>
      <title>Re: RegEx</title>
      <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689978#M209816</link>
      <description>Please do not forget that the whole blood test is in a single column named TXT (with no line breaks)</description>
      <pubDate>Thu, 08 Oct 2020 13:45:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/689978#M209816</guid>
      <dc:creator>shakednav</dc:creator>
      <dc:date>2020-10-08T13:45:10Z</dc:date>
    </item>
    <item>
      <title>Re: RegEx</title>
      <link>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/690306#M210013</link>
      <description>Post more data ,so I could test the code .</description>
      <pubDate>Fri, 09 Oct 2020 10:50:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/RegEx/m-p/690306#M210013</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-10-09T10:50:30Z</dc:date>
    </item>
  </channel>
</rss>

