<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Clustering Twitter data and TF-IDF Matrix in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Clustering-Twitter-data-and-TF-IDF-Matrix/m-p/210344#M2911</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello everyone! I'm currently using SAS Enterprise Miner 12.1 and running into some trouble about how to procede.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a data set consisting of tweets, and I intend to create clusters from the information I collected. So far, I've cleaned the data and built a diagram like this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="diagrama.PNG" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/11480_diagrama.PNG" width="450" /&gt;&lt;/P&gt;&lt;P&gt;I also understand that the TF-IDF matrix can be found in the "exported data" option of the Text filter node &lt;/P&gt;&lt;P&gt;(found about it in these 2 other discussion posts&lt;/P&gt;&lt;P&gt;&lt;A __default_attr="257527" __jive_macro_name="message" class="jive_macro jive_macro_message" href="https://communities.sas.com/" modifiedtitle="true" title="Re: TF-IDF in SAS Text Miner"&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A __default_attr="228854" __jive_macro_name="message" class="jive_macro jive_macro_message" href="https://communities.sas.com/"&gt;&lt;/A&gt;&amp;nbsp;&amp;nbsp; )&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Looks like this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="matriz_tfidf.PNG" class="jiveImage" src="https://communities.sas.com/legacyfs/online/11481_matriz_tfidf.PNG" style="width: 422px; height: 442px;" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is this it??&lt;/P&gt;&lt;P&gt;So the question is: &lt;/P&gt;&lt;P&gt;Assuming this is the matrix I need to input to the clustering node as the features vector to perform the clustering algorithm, by simply running the Text Cluster node, will it assume the TF-IDF matrix by default or do I have to change the input somehow? And change the node configuration itself? &lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt;In the text filter node I set the Frequency weighting to LOG and the Term weight to IDF.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 31 Aug 2015 00:30:48 GMT</pubDate>
    <dc:creator>hugo_viga</dc:creator>
    <dc:date>2015-08-31T00:30:48Z</dc:date>
    <item>
      <title>Clustering Twitter data and TF-IDF Matrix</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Clustering-Twitter-data-and-TF-IDF-Matrix/m-p/210344#M2911</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello everyone! I'm currently using SAS Enterprise Miner 12.1 and running into some trouble about how to procede.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a data set consisting of tweets, and I intend to create clusters from the information I collected. So far, I've cleaned the data and built a diagram like this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="diagrama.PNG" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/11480_diagrama.PNG" width="450" /&gt;&lt;/P&gt;&lt;P&gt;I also understand that the TF-IDF matrix can be found in the "exported data" option of the Text filter node &lt;/P&gt;&lt;P&gt;(found about it in these 2 other discussion posts&lt;/P&gt;&lt;P&gt;&lt;A __default_attr="257527" __jive_macro_name="message" class="jive_macro jive_macro_message" href="https://communities.sas.com/" modifiedtitle="true" title="Re: TF-IDF in SAS Text Miner"&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A __default_attr="228854" __jive_macro_name="message" class="jive_macro jive_macro_message" href="https://communities.sas.com/"&gt;&lt;/A&gt;&amp;nbsp;&amp;nbsp; )&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Looks like this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="matriz_tfidf.PNG" class="jiveImage" src="https://communities.sas.com/legacyfs/online/11481_matriz_tfidf.PNG" style="width: 422px; height: 442px;" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is this it??&lt;/P&gt;&lt;P&gt;So the question is: &lt;/P&gt;&lt;P&gt;Assuming this is the matrix I need to input to the clustering node as the features vector to perform the clustering algorithm, by simply running the Text Cluster node, will it assume the TF-IDF matrix by default or do I have to change the input somehow? And change the node configuration itself? &lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt;In the text filter node I set the Frequency weighting to LOG and the Term weight to IDF.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 31 Aug 2015 00:30:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Clustering-Twitter-data-and-TF-IDF-Matrix/m-p/210344#M2911</guid>
      <dc:creator>hugo_viga</dc:creator>
      <dc:date>2015-08-31T00:30:48Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering Twitter data and TF-IDF Matrix</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Clustering-Twitter-data-and-TF-IDF-Matrix/m-p/235055#M3338</link>
      <description>&lt;P&gt;&lt;A class="lia-link-navigation lia-page-link lia-user-name-link" id="link_8" style="color: rgb(153, 153, 153);" href="https://communities.sas.com/t5/user/viewprofilepage/user-id/25568" target="_self"&gt;&lt;SPAN&gt;hugo_viga&lt;/SPAN&gt;&lt;/A&gt; ,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First, thanks&amp;nbsp; for using SAS. My name is Jason Xin, advanced analytics solution architect working at SAS Institute.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In EM, the Text Parsing node&amp;nbsp; does all the heavy duty initial work ending in frequency matrix. Text Filer node essentially is where most machine-human interaction, subsetting, trimming terms, keep/drop, viewing sterms,... happens. Although the content has been massaged this and that, and certainly exported data sets appear different, the essence remains frequency matrix /query matrix.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In rare cases one benefits from clustering directly on count matrix. In most cases, which I suspect includes your case, you would engage SVD as input into text clustering. I cannot find a machine that runs 12.1. I recall SVD back in 12.1 inside Text Cluster node,&amp;nbsp; the same as 14.1 that I am running now. So the answer to your question is just to connect the TF node to TC node and configure SVD there.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps. Best Regards&lt;/P&gt;
&lt;P&gt;Jason Xin&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2015 16:14:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Clustering-Twitter-data-and-TF-IDF-Matrix/m-p/235055#M3338</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2015-11-17T16:14:29Z</dc:date>
    </item>
  </channel>
</rss>

