<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Co Word Analysis with SAS in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Co-Word-Analysis-with-SAS/m-p/50499#M289</link>
    <description>Text Miner uses a compressed representation of the term-by-doc frequency matrix. You will find an OUT data set in the project data directory of  your text miner run. Its label will include the string "OUT" in it.  Since a 30,000 document collection will have as many as 500,000 to a million distinct terms, be sure to restrict your terms of interest with a start list. I give an example of creating the cooccurrence matrix with the following code which expands the compressed version to an uncompressed version and then computes the co-occurrence count with proc corr and the sscp option.&lt;BR /&gt;
&lt;BR /&gt;
Thanks.&lt;BR /&gt;
Russ&lt;BR /&gt;
&lt;BR /&gt;
data myOUT;&lt;BR /&gt;
input term doc count;&lt;BR /&gt;
datalines;&lt;BR /&gt;
1 1 1&lt;BR /&gt;
1 3 1&lt;BR /&gt;
1 4 1&lt;BR /&gt;
2 2 1&lt;BR /&gt;
2 3 2&lt;BR /&gt;
3 1 2&lt;BR /&gt;
3 3 2&lt;BR /&gt;
3 4 1&lt;BR /&gt;
4 2 2&lt;BR /&gt;
4 4 1&lt;BR /&gt;
5 3 2&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
proc sort data=myOUT;&lt;BR /&gt;
by  doc term;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
data docbyterm;&lt;BR /&gt;
set myOUT;&lt;BR /&gt;
by doc;&lt;BR /&gt;
array t{5};&lt;BR /&gt;
retain t;&lt;BR /&gt;
if first.doc then do;&lt;BR /&gt;
   do i=1 to 5;&lt;BR /&gt;
      t{i}=0;&lt;BR /&gt;
   end;&lt;BR /&gt;
 end;&lt;BR /&gt;
t{term}=count;&lt;BR /&gt;
if last.doc then do;&lt;BR /&gt;
   output;&lt;BR /&gt;
end;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
proc corr data=docbyterm cov outp=cooccur sscp;&lt;BR /&gt;
var t1-t5;&lt;BR /&gt;
run;</description>
    <pubDate>Fri, 17 Jul 2009 13:26:40 GMT</pubDate>
    <dc:creator>RussAlbright</dc:creator>
    <dc:date>2009-07-17T13:26:40Z</dc:date>
    <item>
      <title>Co Word Analysis with SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Co-Word-Analysis-with-SAS/m-p/50498#M288</link>
      <description>I would like to implement with SAS text Miner a co word analysis on a 30 000 scientific articles corpus. &lt;BR /&gt;
&lt;BR /&gt;
My very first issue is to obtain the word count matrice (not reduced in vectors).&lt;BR /&gt;
&lt;BR /&gt;
in line every start word&lt;BR /&gt;
and in column ecery start word&lt;BR /&gt;
and as value, their count of their respective co occurences.&lt;BR /&gt;
&lt;BR /&gt;
Is anyone familiar with the co word analysis ?&lt;BR /&gt;
Is any one willing to help me ?</description>
      <pubDate>Wed, 01 Jul 2009 08:54:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Co-Word-Analysis-with-SAS/m-p/50498#M288</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-01T08:54:44Z</dc:date>
    </item>
    <item>
      <title>Re: Co Word Analysis with SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Co-Word-Analysis-with-SAS/m-p/50499#M289</link>
      <description>Text Miner uses a compressed representation of the term-by-doc frequency matrix. You will find an OUT data set in the project data directory of  your text miner run. Its label will include the string "OUT" in it.  Since a 30,000 document collection will have as many as 500,000 to a million distinct terms, be sure to restrict your terms of interest with a start list. I give an example of creating the cooccurrence matrix with the following code which expands the compressed version to an uncompressed version and then computes the co-occurrence count with proc corr and the sscp option.&lt;BR /&gt;
&lt;BR /&gt;
Thanks.&lt;BR /&gt;
Russ&lt;BR /&gt;
&lt;BR /&gt;
data myOUT;&lt;BR /&gt;
input term doc count;&lt;BR /&gt;
datalines;&lt;BR /&gt;
1 1 1&lt;BR /&gt;
1 3 1&lt;BR /&gt;
1 4 1&lt;BR /&gt;
2 2 1&lt;BR /&gt;
2 3 2&lt;BR /&gt;
3 1 2&lt;BR /&gt;
3 3 2&lt;BR /&gt;
3 4 1&lt;BR /&gt;
4 2 2&lt;BR /&gt;
4 4 1&lt;BR /&gt;
5 3 2&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
proc sort data=myOUT;&lt;BR /&gt;
by  doc term;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
data docbyterm;&lt;BR /&gt;
set myOUT;&lt;BR /&gt;
by doc;&lt;BR /&gt;
array t{5};&lt;BR /&gt;
retain t;&lt;BR /&gt;
if first.doc then do;&lt;BR /&gt;
   do i=1 to 5;&lt;BR /&gt;
      t{i}=0;&lt;BR /&gt;
   end;&lt;BR /&gt;
 end;&lt;BR /&gt;
t{term}=count;&lt;BR /&gt;
if last.doc then do;&lt;BR /&gt;
   output;&lt;BR /&gt;
end;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
proc corr data=docbyterm cov outp=cooccur sscp;&lt;BR /&gt;
var t1-t5;&lt;BR /&gt;
run;</description>
      <pubDate>Fri, 17 Jul 2009 13:26:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Co-Word-Analysis-with-SAS/m-p/50499#M289</guid>
      <dc:creator>RussAlbright</dc:creator>
      <dc:date>2009-07-17T13:26:40Z</dc:date>
    </item>
    <item>
      <title>Re: Co Word Analysis with SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Co-Word-Analysis-with-SAS/m-p/50500#M290</link>
      <description>many thanks, &lt;BR /&gt;
&lt;BR /&gt;
Charles</description>
      <pubDate>Fri, 17 Jul 2009 16:55:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Co-Word-Analysis-with-SAS/m-p/50500#M290</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-17T16:55:23Z</dc:date>
    </item>
  </channel>
</rss>

