<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Stop Words in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617789#M181100</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/192941"&gt;@mar00390&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sorry, but I need to know a bit more to answer that.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you ran my example code, then notice that the original string is also kept in output, so you have before/after in each record.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you used your own data, I need to have an example, at least one stop word and a string where the stop word occurs. Then I'll look into it.&lt;/P&gt;</description>
    <pubDate>Thu, 16 Jan 2020 15:18:16 GMT</pubDate>
    <dc:creator>ErikLund_Jensen</dc:creator>
    <dc:date>2020-01-16T15:18:16Z</dc:date>
    <item>
      <title>Stop Words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617542#M180983</link>
      <description>&lt;P&gt;&lt;SPAN&gt;How do I delete prepositions/conjunctions/auxiliary verbs from a string? My strings have a length of 32,767&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jan 2020 19:21:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617542#M180983</guid>
      <dc:creator>mar00390</dc:creator>
      <dc:date>2020-01-15T19:21:52Z</dc:date>
    </item>
    <item>
      <title>Re: Stop Words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617546#M180985</link>
      <description>&lt;P&gt;Base SAS doesn't contain any functionality to identify language components. You are limited to word and character pattern matches.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS Text Miner probably has more capabilities, but I doubt it can parse grammatical terms.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jan 2020 19:49:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617546#M180985</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2020-01-15T19:49:53Z</dc:date>
    </item>
    <item>
      <title>Re: Stop Words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617549#M180988</link>
      <description>You need to use the text parsing node with a stop list. &lt;BR /&gt;&lt;A href="https://documentation.sas.com/?docsetId=tmref&amp;amp;docsetTarget=p0h83e35apv01tn13y0ulqr9r2tp.htm&amp;amp;docsetVersion=14.3&amp;amp;locale=en" target="_blank"&gt;https://documentation.sas.com/?docsetId=tmref&amp;amp;docsetTarget=p0h83e35apv01tn13y0ulqr9r2tp.htm&amp;amp;docsetVersion=14.3&amp;amp;locale=en&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 15 Jan 2020 19:55:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617549#M180988</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-01-15T19:55:49Z</dc:date>
    </item>
    <item>
      <title>Re: Stop Words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617550#M180989</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/192941"&gt;@mar00390&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You nede to create a list of stop words. There might be something to download as a starting point, but otherwise it's just hard work. Given the list and a SAS data set with your strings, an easy solution is to use a format to pick the stop words in the string. The following working code shows the principles.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You need to set proper lengths etc. to make it work with your data. Given your word classes it is probably unnecessary to handle uppercase/lowercase words, but it can be done with a lowcase function on teststr. And be aware that words in the output string are always separated by one blank even if there are more in the input string.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;* Test data;
data stopwords;
	input stopword $20.;
	cards;
abc 
xyz 
;
run;

data have;
	infile cards truncover;
	input string $char200.;
	cards;
aaa abc bbbbbbbbbb c 123 dddd ffff xyz
123 zzzzzzzzzz xyz hhhhhh
;
run;

* Create format;
data stopfmt; set stopwords end=end;
	retain type 'C' fmtname 'stopfmt';
	start = stopword;
	label = stopword;
	output;
	if end then do;
		hlo = 'O';
		start = '';
		label = '';
		output;
	end;

run;
proc format cntlin=stopfmt;
run;

* Remove all words defined as stop words from string;
data want (drop=i teststr); set have;
	length newstr $200 teststr $50;
	do i = 1 to countw(string,' ');
		teststr = scan(string,i,' ');
		if put(teststr,$stopfmt.) = '' then newstr = catx(' ',newstr,teststr);
	end;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 15 Jan 2020 19:55:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617550#M180989</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2020-01-15T19:55:54Z</dc:date>
    </item>
    <item>
      <title>Re: Stop Words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617552#M180990</link>
      <description>A smart guy with a better command of hash objects would give a more elegant solution without the format step.</description>
      <pubDate>Wed, 15 Jan 2020 19:58:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617552#M180990</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2020-01-15T19:58:48Z</dc:date>
    </item>
    <item>
      <title>Re: Stop Words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617554#M180991</link>
      <description>Pretty sure the OP is using TextMiner though.</description>
      <pubDate>Wed, 15 Jan 2020 20:02:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617554#M180991</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-01-15T20:02:06Z</dc:date>
    </item>
    <item>
      <title>Re: Stop Words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617757#M181081</link>
      <description>&lt;P&gt;This kept it the same. Is there a reason it wouldn't work?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 14:17:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617757#M181081</guid>
      <dc:creator>mar00390</dc:creator>
      <dc:date>2020-01-16T14:17:04Z</dc:date>
    </item>
    <item>
      <title>Re: Stop Words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617789#M181100</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/192941"&gt;@mar00390&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sorry, but I need to know a bit more to answer that.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you ran my example code, then notice that the original string is also kept in output, so you have before/after in each record.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you used your own data, I need to have an example, at least one stop word and a string where the stop word occurs. Then I'll look into it.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 15:18:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Stop-Words/m-p/617789#M181100</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2020-01-16T15:18:16Z</dc:date>
    </item>
  </channel>
</rss>

