<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Process New Data Set with Clustering Model in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276243#M14592</link>
    <description>&lt;P&gt;Correct, this is an exploratory analysis. The client simply wants to know who their clients are, there is no target to be modelled/predicted.....&lt;/P&gt;</description>
    <pubDate>Thu, 09 Jun 2016 12:52:06 GMT</pubDate>
    <dc:creator>bkokster</dc:creator>
    <dc:date>2016-06-09T12:52:06Z</dc:date>
    <item>
      <title>Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/275862#M14546</link>
      <description>&lt;P&gt;I've created some clusters using proc fastclus, and I would like to apply the same clustering rulesets to a new dataset.&lt;/P&gt;&lt;P&gt;I'm assuming that this can be done by measuring the distance of each observation to&amp;nbsp;the cluster centroids, and classifying each observation into it's nearest&amp;nbsp;cluster.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a built in function that allows for this?&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2016 04:46:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/275862#M14546</guid>
      <dc:creator>bkokster</dc:creator>
      <dc:date>2016-06-08T04:46:54Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/275881#M14549</link>
      <description>&lt;P&gt;No. you can't . It lead you to another Statistical Analysis Approach --&amp;nbsp;DISCRIM Procedure &amp;nbsp;.&lt;/P&gt;
&lt;P&gt;Why not use Logistic Regression or Decision Tree ? both could handle character and numeric variables , while&amp;nbsp;&lt;SPAN&gt;proc fastclus can't .&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2016 06:52:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/275881#M14549</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-06-08T06:52:10Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/275954#M14559</link>
      <description>&lt;P&gt;In PROC FASTCLUS you can use the OUTSTAT= option to create an output data set that contains the centers of each cluster:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc fastclus data=sashelp.iris maxclusters=3 outstat=OutClus;
VAR SepalLength SepalWidth PetalLength PetalWidth;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you have some new data, you can merge the new data with the centers and use PROC DISTANCE to compute the distance between the new obs and the centers.&amp;nbsp; the followin uses Euclidean distance, but other distances are available:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data NewData;
input SepalLength SepalWidth PetalLength PetalWidth;
datalines;
50 33 14 2 
62 22 45 15 
59 32 48 18 
64 28 56 22 
67 31 56 24 
;

data ALL;
set OutClus(where=(_TYPE_="CENTER")) NewData;
run;

proc distance data=All prefix=DistFromClus_ out=distances;
var interval(SepalLength SepalWidth PetalLength PetalWidth);
copy Cluster;
run;

/* we only want distances between centers */
data distances;
set distances;
where cluster = .;
array arr[*] DistFromClus_1-DistFromClus_3;
ClusterID = whichn(min(of arr[*]), of arr[*]); /* find index of min value in row */
keep DistFromClus_1-DistFromClus_3 ClusterID;
run;

proc print; run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2016 13:40:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/275954#M14559</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2016-06-08T13:40:54Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276154#M14581</link>
      <description>&lt;P&gt;Thanks Rick_SAS, that answers my question!&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2016 05:05:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276154#M14581</guid>
      <dc:creator>bkokster</dc:creator>
      <dc:date>2016-06-09T05:05:47Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276229#M14584</link>
      <description>&lt;PRE&gt;
Rick,
If that should be , Why not combine them together and then run proc fastclus ?

data want;
 set sashelp.iris  new_date;
run;

proc fastclus data=want .....


&lt;/PRE&gt;</description>
      <pubDate>Thu, 09 Jun 2016 12:03:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276229#M14584</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-06-09T12:03:35Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276235#M14585</link>
      <description>&lt;P&gt;I interpret the question as "I want to build a model on Data A and then score the model on Data B." &amp;nbsp;If you merge the data, then you are using the second set of obs to build the model, which is not the same thing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2016 12:22:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276235#M14585</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2016-06-09T12:22:00Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276238#M14586</link>
      <description>&lt;P&gt;Yes that's right. Using your example: Data A is consumer sales for the past month that will be used to cluster consumers. The client will compile a number of strategies around these clusters.&amp;nbsp;At the end of the&amp;nbsp;following month, Data B will be scored with the clusters developed on Data A. So we'll need to understand if there were any shifts in the clusters that were developed on Data A. Combining Data A and Data B and reclustering could result in totally different clusters and mess with the client's strategies....&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2016 12:27:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276238#M14586</guid>
      <dc:creator>bkokster</dc:creator>
      <dc:date>2016-06-09T12:27:37Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276240#M14587</link>
      <description>Rick,
Discrim Analysis do the exact same thing as  "I want to build a model on Data A and then score the model on Data B." 
So I suggest OP to use PROC DISCRM .</description>
      <pubDate>Thu, 09 Jun 2016 12:41:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276240#M14587</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-06-09T12:41:49Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276241#M14591</link>
      <description>&lt;P&gt;Sorry, KSharp,&amp;nbsp;but I disagree. Discriminant analysis is an example of supervised learning. You need a nominal target variable Y with k levels and the goal is to group the explanatory variables X into k groups so that most of Group1 has Y=1, most of Group2 has Y=2, etc.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Clustering is a form of unsupervised learning. The OP did not mention a target variable. In unsupervised learning you only have X and you want to group the observations together, often by using some distance metric.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2016 12:49:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276241#M14591</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2016-06-09T12:49:57Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276243#M14592</link>
      <description>&lt;P&gt;Correct, this is an exploratory analysis. The client simply wants to know who their clients are, there is no target to be modelled/predicted.....&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2016 12:52:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276243#M14592</guid>
      <dc:creator>bkokster</dc:creator>
      <dc:date>2016-06-09T12:52:06Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276246#M14593</link>
      <description>&lt;PR&gt;
Rick,

Yeah. As you said Clustering Analysis only need TRAIN table, don't need TEST table. while Discriminant analysis need
both TRAIN and TEST table.  That is the most different thing between them.

Since OP don't have Y variable( don't know which obs belong to which Y ). Why not combine them 
together and let PROC FASTCLUS tell you ?

&lt;/PR&gt;</description>
      <pubDate>Thu, 09 Jun 2016 13:03:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276246#M14593</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-06-09T13:03:45Z</dc:date>
    </item>
    <item>
      <title>Re: Process New Data Set with Clustering Model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276248#M14595</link>
      <description>If OP make sure which obs belong to which Y , Why not use Discriminant analysis?</description>
      <pubDate>Thu, 09 Jun 2016 13:16:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Process-New-Data-Set-with-Clustering-Model/m-p/276248#M14595</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-06-09T13:16:22Z</dc:date>
    </item>
  </channel>
</rss>

