<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Restricting keywords in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377045#M276507</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If the raw data looked like the below;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"Firm A advises on the Volkswagen settlement with US authorities regarding emissions from diesel engines Firm A advises The Carlyle Group on its $3.2bn acquisition of Atotech"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;how would you create two want values and pick up both&amp;nbsp;&lt;SPAN&gt;Volkswagen &amp;amp; &amp;nbsp;The Carlyle Group?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;Chris&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 18 Jul 2017 15:18:28 GMT</pubDate>
    <dc:creator>cmoore</dc:creator>
    <dc:date>2017-07-18T15:18:28Z</dc:date>
    <item>
      <title>Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375685#M276500</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a column that contains sentences in where I want to extract a specific keyword from and create a new column containing the keyword. Please see below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Sentence&lt;/TD&gt;&lt;TD&gt;Keyword Desired&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;Firm A helps Midea to successfully complete its takeover of KUKA&lt;/TD&gt;&lt;TD&gt;Midea&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;Firm A advises on the Volkswagen settlement with US authorities regarding emissions from diesel engines&lt;/TD&gt;&lt;TD&gt;Volkswagen&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;Firm A advises The Carlyle Group on its $3.2bn acquisition of Atotech&lt;/TD&gt;&lt;TD&gt;The Carlyle Group&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The sentences are written quite similar and most of the keywords I want fall immediately after "advises" or "helps". Is there are a way of obtaining the keywords automatically after those identifiers?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many Thanks&lt;BR /&gt;&lt;BR /&gt;Chris&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jul 2017 13:49:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375685#M276500</guid>
      <dc:creator>cmoore</dc:creator>
      <dc:date>2017-07-13T13:49:37Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375692#M276501</link>
      <description>&lt;P&gt;There is, the question is going to be how to know when to stop taking words after helps/advises. &amp;nbsp; Anyway to start:&lt;/P&gt;
&lt;PRE&gt;data want;
  set have;
  if index(sentance,"helps") &amp;gt; 0 then after=substr(sentance,index(sentance,"helps") + 6);
run;&lt;/PRE&gt;
&lt;P&gt;This will take everything in the string after the word helps. &amp;nbsp;You can do the same for advises. &amp;nbsp;But I don't see any logical way of knowing if there should be 1,2,3, etc words after helps to be saved?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jul 2017 14:03:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375692#M276501</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2017-07-13T14:03:07Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375835#M276502</link>
      <description>&lt;P&gt;Are you lucky enough that&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;All your sentences have the term&amp;nbsp;"Firm A" preceding (by some number of words) the desired keyword(s)?&lt;/LI&gt;
&lt;LI&gt;The desired keyword(s) will&amp;nbsp;always be the first&amp;nbsp;set of&amp;nbsp;words (after "Firm A") with initial letters capitalized?&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Then you could find the location of "Firm A" in your sentence, and then scan the remainder looking for words with initial letters capitalized:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop=_:);
  set have;
  _ix = indexw(sentence,"Firm A");                   /* Find "Firm A" in the sentence */
  _mod_string=substr(sentence,_ix+length("Firm A")); /* Get remainder of the sentence */

  length key_word $30;
  /* Extract each word and see if it starts with a capital letter */
  /* Once non-capitalized word is encountered and key_word is not blank, leave the loop */
  do _w=1 to countw(_mod_string);
    _test_word=scan(_mod_string,_w,'');  
	if char(_test_word,1)=upcase(char(_test_word,1)) then key_word=catx(' ',key_word,_test_word);
	else if key_word^=' ' then leave;
  end;    
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jul 2017 21:13:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375835#M276502</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2017-07-13T21:13:37Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375854#M276503</link>
      <description>&lt;P&gt;If you have a list of companies of interest&amp;nbsp;and all the likely permutations of the spelling of the names you may be better off using a search to indicate found words.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Something that is unreliable but may help with some of the multi-word names is to keep sequential words (append in&amp;nbsp;a single variable) that start with uppercase letters. All those names with lower case words common in names such as "of" "and" would fail though you might get enough&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How many records are you going to be searching?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jul 2017 22:13:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375854#M276503</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-07-13T22:13:25Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375964#M276504</link>
      <description>&lt;P&gt;Here can give you a start.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards truncover;
input x $200.;
cards;
Firm A helps Midea to successfully complete its takeover of KUKA	
Firm A advises on the Volkswagen settlement with US authorities regarding emissions from diesel engines	
Firm A advises The Carlyle Group on its $3.2bn acquisition of Atotech
;
run;

data want;
 set have;
 temp=substr(x,prxmatch('/helps|advises/i',x));
 want=catx(' ',scan(temp,2),scan(temp,3),scan(temp,4));
 drop temp;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 14 Jul 2017 10:51:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/375964#M276504</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-07-14T10:51:06Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/376218#M276505</link>
      <description>&lt;P&gt;The following code uses a Regular Expression which captures the first set of words starting with a capital after terms &lt;EM&gt;help&lt;/EM&gt; or &lt;EM&gt;advise.&lt;/EM&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  infile cards truncover;
  input have_str $200.;
  cards;
Firm A helps Midea to successfully complete its takeover of KUKA  
Firm A advised on the Volkswagen settlement with US authorities regarding emissions from diesel engines 
Firm A advises The Carlyle Group on its $3.2bn acquisition of Atotech
;
run;

data want;
  set have;
  length want $100;
  retain _prxid;

  if _n_=1 then
    _prxid=prxparse('/(\bhelp|\badvise)[^A-Z]{0,10}(([A-Z]\w+\s*\b)+)/');

  if prxmatch(_prxid, have_str) then
    do;
      want=prxposn(_prxid, 2, have_str);
    end;;
run;

proc print data=want;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Jul 2017 10:45:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/376218#M276505</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-07-15T10:45:32Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/376437#M276506</link>
      <description>&lt;P&gt;There are around 40k records that I would be screening&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jul 2017 09:08:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/376437#M276506</guid>
      <dc:creator>cmoore</dc:creator>
      <dc:date>2017-07-17T09:08:38Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377045#M276507</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If the raw data looked like the below;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"Firm A advises on the Volkswagen settlement with US authorities regarding emissions from diesel engines Firm A advises The Carlyle Group on its $3.2bn acquisition of Atotech"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;how would you create two want values and pick up both&amp;nbsp;&lt;SPAN&gt;Volkswagen &amp;amp; &amp;nbsp;The Carlyle Group?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;Chris&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2017 15:18:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377045#M276507</guid>
      <dc:creator>cmoore</dc:creator>
      <dc:date>2017-07-18T15:18:28Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377147#M276508</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/136670"&gt;@cmoore&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;Is it always&amp;nbsp;&lt;SPAN&gt;&lt;EM&gt;Firm A&lt;/EM&gt;? Is this a term we can use as keyword for a search?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2017 22:06:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377147#M276508</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-07-18T22:06:01Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377160#M276509</link>
      <description>&lt;P&gt;Or:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data WANT;
  set HAVE;
  KEYWORD=prxchange('s/.*?\b(help|advise)[^A-Z]{0,10}((\b[A-Z]\S+ )+).*/$2/o',1,TXT);
  put KEYWORD=;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;KEYWORD=Midea&lt;BR /&gt;KEYWORD=Volkswagen&lt;BR /&gt;KEYWORD=The Carlyle Group&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2017 02:14:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377160#M276509</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2017-07-19T02:14:59Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377192#M276510</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;No it's not always &lt;EM&gt;Firm A.&amp;nbsp;&lt;/EM&gt;The text itself could start with one of the 3 keywords such as&amp;nbsp;&lt;EM&gt;advised.&amp;nbsp;&lt;/EM&gt;I used Firm A to start the sentence and is therefore not a keyword in the search.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;Chris&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2017 07:16:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377192#M276510</guid>
      <dc:creator>cmoore</dc:creator>
      <dc:date>2017-07-19T07:16:23Z</dc:date>
    </item>
    <item>
      <title>Re: Restricting keywords</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377223#M276511</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/136670"&gt;@cmoore&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;Something like below could work.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  infile cards truncover;
  input have_str $200.;
  datalines;
Firm A helps Midea to successfully complete its takeover of KUKA  
Firm A advised on the Volkswagen settlement with US authorities regarding emissions from diesel engines 
Firm A advises The Carlyle Group on its $3.2bn acquisition of Atotech
Firm A advises on the Volkswagen settlement with US authorities regarding emissions from diesel engines Firm A advises The Carlyle Group on its $3.2bn acquisition of Atotech
;
run;

data want(drop=_:);
  set have;
  length want $100;
  retain _prxid;

  if _n_=1 then
    _prxid=prxparse('/(\bhelp|\badvise)[^A-Z]{0,10}(([A-Z]\w+\s*\b)+)/');
  _start=1;
  _stop=length(have_str);

  call prxnext(_prxid, _start, _stop, have_str, _pos, _len);
  do while (_pos &amp;gt; 0);
    want=catx('|',want,prxposn(_prxid, 2, have_str));
    call prxnext(_prxid, _start, _stop, have_str, _pos, _len);
  end;

run;

proc print data=want;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 19 Jul 2017 10:12:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Restricting-keywords/m-p/377223#M276511</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-07-19T10:12:20Z</dc:date>
    </item>
  </channel>
</rss>

