<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extract names of people from paragraph in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Extract-names-of-people-from-paragraph/m-p/756661#M238937</link>
    <description>How are names uniquely identified in your text? Can you include some sample data?&lt;BR /&gt;&lt;BR /&gt;If there's no way to differentiate between someone referring to John or john or Apple/Blue being valid names you're going to have margins of error. &lt;BR /&gt;&lt;BR /&gt;I find Google API's relatively good at this. Do you have access to SAS EM with the text capabilities?</description>
    <pubDate>Mon, 26 Jul 2021 15:11:32 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2021-07-26T15:11:32Z</dc:date>
    <item>
      <title>Extract names of people from paragraph</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-names-of-people-from-paragraph/m-p/756599#M238915</link>
      <description>&lt;P&gt;Hi everyone ,&lt;/P&gt;
&lt;P&gt;I want to extract names of people from an article . Some of the names start with a title and some do not.&lt;/P&gt;
&lt;P&gt;I am using prxparse and prxnext to find the names and I am partially successfully in the task as well , however in addition to the names other text matching the pattern are also being extracted which was expected . Can you please suggest a way to find only the names with or without the title?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the code below I am trying to find names without any title.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename source "location/source.txt";

proc http
	method="get"
	url="https://amabhungane.org/stories/210701-vbs-indictment-details-corrupt-gratifications-driving-illegal-municipal-investments-in-the-doomed-bank/"
	out=source;
run;
data work.rep(drop=line linenum rx1);
infile source length=len lrecl=32767;
input line $varying32767. len;
 line = strip(line);
 linenum=_n_;
 retain rx1;
 rx1=prxparse("s/&amp;lt;.*?&amp;gt;//");
 if len&amp;gt;0;
 string = line;
 if find(line,'&amp;lt;p&amp;gt;') gt 0 then do;
 	call prxchange(rx1,-1,string);
	output;
 end;
run;
proc transpose data=work.rep out=work.rep_t;
	var string;
run;
data work.extracted_para;
	length paragraph $ 5000 ;
	set work.rep_t;
	paragraph = catx(". ", col,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20);
	keep paragraph;
run;
data extract;
	set work.extracted_para;
	start_pos=1;
	stop_pos=length(paragraph);
	pattern_pos = prxparse("/ [A-Z]{1}\w+\s[A-Z]{1}\w+ /");
	call prxnext(pattern_pos, start_pos, stop_pos, paragraph, position, length);
      do while (position &amp;gt; 0);
        name=substr(paragraph, position, length);
 		output;
         call prxnext(pattern_pos, start_pos, stop_pos,paragraph, position, length);
      end;
run;&lt;/CODE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jul 2021 11:38:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-names-of-people-from-paragraph/m-p/756599#M238915</guid>
      <dc:creator>kaziumair</dc:creator>
      <dc:date>2021-07-26T11:38:49Z</dc:date>
    </item>
    <item>
      <title>Re: Extract names of people from paragraph</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-names-of-people-from-paragraph/m-p/756661#M238937</link>
      <description>How are names uniquely identified in your text? Can you include some sample data?&lt;BR /&gt;&lt;BR /&gt;If there's no way to differentiate between someone referring to John or john or Apple/Blue being valid names you're going to have margins of error. &lt;BR /&gt;&lt;BR /&gt;I find Google API's relatively good at this. Do you have access to SAS EM with the text capabilities?</description>
      <pubDate>Mon, 26 Jul 2021 15:11:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-names-of-people-from-paragraph/m-p/756661#M238937</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-07-26T15:11:32Z</dc:date>
    </item>
    <item>
      <title>Re: Extract names of people from paragraph</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-names-of-people-from-paragraph/m-p/756725#M238979</link>
      <description>Hi, in the article the names are in proper case, example - David Beckham . There are some names which are preceeded by titles for example minister, president, officer, etc.&lt;BR /&gt;I do not have access to SAS EM, but I do have access to SAS Viya .</description>
      <pubDate>Mon, 26 Jul 2021 17:33:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-names-of-people-from-paragraph/m-p/756725#M238979</guid>
      <dc:creator>kaziumair</dc:creator>
      <dc:date>2021-07-26T17:33:39Z</dc:date>
    </item>
  </channel>
</rss>

