<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extracting keywords and corresponding number in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369905#M275612</link>
    <description>&lt;P&gt;OK. if you like Perl Regular Expression.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
x="Heart: Normal heart muscle colon (maximal wall thickness = 15mm). Normal aorta (maximal wall thickness = 1.6mm).
Lung: Normal lung (maximal wall thickness = 1.9mm), however movement is absent from the distal part.
Other: Reactive lymphadenopathy is seen . No complications of disease were noted.";


pid=prxparse('/(heart muscle|aorta|lung)[\w\s]+\([^\(\)]+\)/i');
pid1=prxparse('/(heart muscle|aorta|lung).+\s([\d\.]+mm)/i');
s=1;
e=length(x);
call prxnext(pid,s,e,x,p,l);
do while(p&amp;gt;0);
 want=substr(x,p,l);
 if prxmatch(pid1,want) then do;
  call prxposn(pid1,1,p1,l1);
  x1=substr(want,p1,l1);
  call prxposn(pid1,2,p2,l2);
  x2=substr(want,p2,l2);
 end;
 output;
 call prxnext(pid,s,e,x,p,l);
end;


drop pid s e p l pid1 p1 l1 p2 l2;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 23 Jun 2017 14:34:37 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2017-06-23T14:34:37Z</dc:date>
    <item>
      <title>Extracting keywords and corresponding number</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369817#M275607</link>
      <description>&lt;P&gt;Hi everyone&lt;/P&gt;
&lt;P&gt;Imagine this string in a dataset as follws:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;"Heart: Normal heart&amp;nbsp;muscle&amp;nbsp;colon (maximal wall thickness = 15mm). Normal aorta&amp;nbsp;(maximal wall thickness = 1.6mm).&lt;/P&gt;
&lt;P class="p1"&gt;Lung: Normal lung&amp;nbsp;(maximal wall thickness = 1.9mm), however movement&amp;nbsp;is absent from the distal part.&lt;/P&gt;
&lt;P class="p1"&gt;Other: Reactive lymphadenopathy is seen . No complications of disease were noted."&lt;/P&gt;
&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;What I want is to extract data as follows:&lt;/P&gt;
&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE class="t1" cellspacing="0" cellpadding="0"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD valign="middle" class="td1"&gt;
&lt;P class="p1"&gt;Hear Muscle&lt;/P&gt;
&lt;/TD&gt;
&lt;TD valign="middle" class="td1"&gt;
&lt;P class="p1"&gt;15mm&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD valign="middle" class="td1"&gt;
&lt;P class="p1"&gt;Aorta&lt;/P&gt;
&lt;/TD&gt;
&lt;TD valign="middle" class="td1"&gt;
&lt;P class="p1"&gt;1.6mm&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD valign="middle" class="td1"&gt;
&lt;P class="p1"&gt;Lung&lt;/P&gt;
&lt;/TD&gt;
&lt;TD valign="middle" class="td1"&gt;
&lt;P class="p1"&gt;1.9mm&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So basically, the code will need to breake the paragraph into setences, and then finds the senteces that ocntain (heart muscle, aorta, lung) and extract the corresponding number from the same sentence.&lt;/P&gt;
&lt;P&gt;Any help appreciated&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2017 10:56:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369817#M275607</guid>
      <dc:creator>ammarhm</dc:creator>
      <dc:date>2017-06-23T10:56:15Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting keywords and corresponding number</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369821#M275608</link>
      <description>&lt;P&gt;Whilst it may be technically possible to search the string and find things, and then extract further information, I really wouldn't recommend it. &amp;nbsp;Its one of the reasons free text in databases is frowned upon, you could have anything. &amp;nbsp;In this instance, as it is medical data, I would at very minimum have a medic review the free text and provide their expert opinion on what should be extracted from it. &amp;nbsp;I mean someone could write anything in that free text what to do if:&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"Heart: Normal heart&amp;nbsp;muscle&amp;nbsp;colon (maximal wall thickness = 15mm). Normal aorta&amp;nbsp;(maximal wall thickness = 1.6mm) moving to abnormal aorta (maximal wall thickness = 1.5mm)"&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2017 11:12:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369821#M275608</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2017-06-23T11:12:56Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting keywords and corresponding number</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369828#M275609</link>
      <description>&lt;P&gt;Thank you RW9&lt;/P&gt;
&lt;P&gt;I fully understand the limitation, however, this is a standard way how the data was entered and therefore I felt comfortable using SAS as a first step. There will be further manual reviews of the results to make sure we are getting what we need.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2017 11:35:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369828#M275609</guid>
      <dc:creator>ammarhm</dc:creator>
      <dc:date>2017-06-23T11:35:12Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting keywords and corresponding number</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369852#M275610</link>
      <description>&lt;P&gt;Here could give a start .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
x="Heart: Normal heart muscle colon (maximal wall thickness = 15mm). Normal aorta (maximal wall thickness = 1.6mm).
Lung: Normal lung (maximal wall thickness = 1.9mm), however movement is absent from the distal part.
Other: Reactive lymphadenopathy is seen . No complications of disease were noted.";

do i=1 to countw(x,'()') by 2;
 x1=scan(scan(x,i,'()'),-1,'.:');
 x2=scan(scan(x,i+1,'()'),-1,'=');
 if not missing(x1) and not missing(x2) then output;
end;
keep x1 x2;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 23 Jun 2017 13:02:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369852#M275610</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-06-23T13:02:43Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting keywords and corresponding number</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369863#M275611</link>
      <description>&lt;P&gt;Thank you Ksharp&lt;BR /&gt;Almost there with your solution&lt;BR /&gt;assuming that I want to rely on the occurrence of the words "heart muscle" or "aorta" or "lung" and extract the number in their sentences, could you please advice on how to do that?&lt;/P&gt;
&lt;P&gt;ie I dont want to rely on using ":" or "="....&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2017 13:27:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369863#M275611</guid>
      <dc:creator>ammarhm</dc:creator>
      <dc:date>2017-06-23T13:27:35Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting keywords and corresponding number</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369905#M275612</link>
      <description>&lt;P&gt;OK. if you like Perl Regular Expression.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
x="Heart: Normal heart muscle colon (maximal wall thickness = 15mm). Normal aorta (maximal wall thickness = 1.6mm).
Lung: Normal lung (maximal wall thickness = 1.9mm), however movement is absent from the distal part.
Other: Reactive lymphadenopathy is seen . No complications of disease were noted.";


pid=prxparse('/(heart muscle|aorta|lung)[\w\s]+\([^\(\)]+\)/i');
pid1=prxparse('/(heart muscle|aorta|lung).+\s([\d\.]+mm)/i');
s=1;
e=length(x);
call prxnext(pid,s,e,x,p,l);
do while(p&amp;gt;0);
 want=substr(x,p,l);
 if prxmatch(pid1,want) then do;
  call prxposn(pid1,1,p1,l1);
  x1=substr(want,p1,l1);
  call prxposn(pid1,2,p2,l2);
  x2=substr(want,p2,l2);
 end;
 output;
 call prxnext(pid,s,e,x,p,l);
end;


drop pid s e p l pid1 p1 l1 p2 l2;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 23 Jun 2017 14:34:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/369905#M275612</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-06-23T14:34:37Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting keywords and corresponding number</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/370204#M275613</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data have;&lt;BR /&gt;string="Heart: Normal heart muscle colon (maximal wall thickness = 15mm). Normal aorta (maximal wall thickness = 1.6mm).&lt;BR /&gt;Lung: Normal lung (maximal wall thickness = 1.9mm), however movement is absent from the distal part.&lt;BR /&gt;Other: Reactive lymphadenopathy is seen . No complications of disease were noted.";&lt;BR /&gt;id1=prxparse('/\d+.?\d+\w+/');&lt;BR /&gt;id2=prxparse('/(?&amp;lt;=Normal )((\S+ ){1,3})(?=\()/');&lt;BR /&gt;start1=1;&lt;BR /&gt;start2=1;&lt;BR /&gt;end=length(string);&lt;BR /&gt;call prxnext(id2,start2,end,string,position2,length2);&lt;BR /&gt;call prxnext(id1,start1,end,string,position1,length1);&lt;BR /&gt;do while(position1&amp;gt;0);&lt;BR /&gt; Name=substr(string,position2,length2);&lt;BR /&gt; Number=substr(string,position1,length1);&lt;BR /&gt; output;&lt;BR /&gt; call prxnext(id2,start2,end,string,position2,length2);&lt;BR /&gt; call prxnext(id1,start1,end,string,position1,length1);&lt;BR /&gt;end;&lt;BR /&gt;keep name number;&lt;BR /&gt;run;&lt;BR /&gt;proc print;run;&lt;/P&gt;</description>
      <pubDate>Sat, 24 Jun 2017 00:38:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/370204#M275613</guid>
      <dc:creator>slchen</dc:creator>
      <dc:date>2017-06-24T00:38:24Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting keywords and corresponding number</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/370252#M275614</link>
      <description>Excellent work Ksharp, as usual&lt;BR /&gt;Thanks</description>
      <pubDate>Sat, 24 Jun 2017 11:31:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extracting-keywords-and-corresponding-number/m-p/370252#M275614</guid>
      <dc:creator>ammarhm</dc:creator>
      <dc:date>2017-06-24T11:31:04Z</dc:date>
    </item>
  </channel>
</rss>

