<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Clustering project in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306019#M16219</link>
    <description>&lt;P&gt;I am a statistics student.&amp;nbsp; For this project I will have to use WEKA software but I thought it was a good opportunity to learn some new sas as well.&amp;nbsp; I'll be working in BASE SAS, SAS/STAT.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a dataset with 130 college courses over 5 years and 448 students.&amp;nbsp;&amp;nbsp; I would like to find concentrations of classes in groups of 3 or 4 to recommend concentrations.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm looking for some ideas to start and I will continue the research on my own.&amp;nbsp; Is K-Means the right approach for something with this many values?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Mark&lt;/P&gt;</description>
    <pubDate>Thu, 20 Oct 2016 17:40:58 GMT</pubDate>
    <dc:creator>Steelers_In_DC</dc:creator>
    <dc:date>2016-10-20T17:40:58Z</dc:date>
    <item>
      <title>Clustering project</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306019#M16219</link>
      <description>&lt;P&gt;I am a statistics student.&amp;nbsp; For this project I will have to use WEKA software but I thought it was a good opportunity to learn some new sas as well.&amp;nbsp; I'll be working in BASE SAS, SAS/STAT.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a dataset with 130 college courses over 5 years and 448 students.&amp;nbsp;&amp;nbsp; I would like to find concentrations of classes in groups of 3 or 4 to recommend concentrations.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm looking for some ideas to start and I will continue the research on my own.&amp;nbsp; Is K-Means the right approach for something with this many values?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Mark&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 17:40:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306019#M16219</guid>
      <dc:creator>Steelers_In_DC</dc:creator>
      <dc:date>2016-10-20T17:40:58Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering project</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306053#M16220</link>
      <description>&lt;P&gt;What you describe is not that many values... for SAS. What is the role of years in your data? Are your groups classes that should be taken on the same year? What about students that took them on different years?&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 19:28:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306053#M16220</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-10-20T19:28:36Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering project</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306057#M16221</link>
      <description>&lt;P&gt;The reason I mentioned the number of values is because I just had another project dealing with the Iris dataset, so this seems like a lot.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a trend analysis showing classes that have low to zero enrollment over time, but for this exercise the time is irrelevant.&amp;nbsp; If I can look at it over time I will, but I don't think that would be in my deliverable.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My goal is to find classes, in groups of 3 or 4, that are common among students.&amp;nbsp; I will be recommending the school initiate minors with these concentrations.&amp;nbsp; Year over year is irrelevent.&amp;nbsp; They would not have to be taken in anytime frame or any order (no prerequisites).&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 19:33:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306057#M16221</guid>
      <dc:creator>Steelers_In_DC</dc:creator>
      <dc:date>2016-10-20T19:33:11Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering project</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306330#M16226</link>
      <description>&lt;P&gt;I built a large flat file, numbering students per class per quarter, students per semester, students per year.&amp;nbsp; But I was thinking that the only thing I wanted to cluster was students by class.&amp;nbsp; I've been programming for years, but this is all very new to me.&amp;nbsp; Does a two variable dataset make sense?&amp;nbsp; Students by class?&lt;/P&gt;</description>
      <pubDate>Fri, 21 Oct 2016 15:27:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306330#M16226</guid>
      <dc:creator>Steelers_In_DC</dc:creator>
      <dc:date>2016-10-21T15:27:38Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering project</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306391#M16227</link>
      <description>&lt;P&gt;You could try an approach like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Example dataset. Random course assignment will not cluster very well. */
data courses;
call streamInit(79781);
length courseId $12;
courseTaken = 1;
do student = 1 to 100;
    do course = 1 to 20;
        courseId = cats("Course_", course);
        if rand("uniform") &amp;lt; 0.25 then output;
        end;
    end;
drop course;
run;

proc sort data=courses; by courseId student; run;

/* Create table with courses as rows and students as columns */
proc transpose data=courses out=courseTable(drop=_name_) prefix=student_;
by courseId;
var courseTaken;
id student;
run;

/* replace missing with zeros */
proc stdize data=courseTable reponly missing=0 out=courseTable0; 
var student_:;
run;

/* Two courses are similar if many students have taken them both */ 
proc distance data=courseTable0 method=dmatch out=courseDistance shape=square;
var nominal (student_:);
id courseId;
run;

/* Find clusters using non parametric clustering. Do not consider 
 clusters of one or two courses. */
proc modeclus data=courseDistance out=courseClus method=1 dock=2;
id courseId;
var course_:;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Courses do not form tight clusters in this random example, but real life data should do better. &amp;nbsp;You can try other distance metrics or clustering procs and methods.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Oct 2016 17:41:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306391#M16227</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-10-21T17:41:09Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering project</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306417#M16228</link>
      <description>&lt;P&gt;That is awesome, thank you very much. I do have a follow up question. With my data there are many unclassified objects, which I suspected.&amp;nbsp; Due to the small dataset I didn't think it mattered to remove the data.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I get one cluster with my data, same as when I ran your code.&amp;nbsp; I'm not sure what to do with that information.&amp;nbsp; If I want to get more clusters do I need to prep the data, or is there something wrong with the process?&lt;/P&gt;</description>
      <pubDate>Fri, 21 Oct 2016 19:17:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306417#M16228</guid>
      <dc:creator>Steelers_In_DC</dc:creator>
      <dc:date>2016-10-21T19:17:27Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering project</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306463#M16231</link>
      <description>&lt;P&gt;As I said, you can change the distance metric or change the clustering method. With my example above, I get three clusters when I add option R=0.55 to proc modeclus.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Oct 2016 22:03:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-project/m-p/306463#M16231</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-10-21T22:03:30Z</dc:date>
    </item>
  </channel>
</rss>

