<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reproducibility of the results - Hp Clus Procedure in SAS Enterprise Miner in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/788961#M9021</link>
    <description>&lt;P&gt;Your reply doesn't really address OP's main concern.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are you confirming that one should expect&amp;nbsp;&lt;SPAN&gt;HPCLUS&amp;nbsp;&lt;/SPAN&gt;PROC to produce different clustering results in different runs, even if the same seed is set&lt;SPAN&gt;? If so, how much should one expect the results to change from run to run (does the HPCLUS implementation have some nice statistical convergence properties despite differing results)?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Is there ANY way to ensure that PROC HPCLUS results are reproducible in different sessions given a fixed seed (in Enterprise Miner)? &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I understand that PROC HPCLUS could use distributed computing/parallel processing and all that. However, the fact that a seed (or other parameters) doesn't guarantee&amp;nbsp;the same outputs in different runs might sound concerning for the management.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;If you were an analyst, how would you convince to your manager to use SAS PROC HPCLUS (not reproducible) over R/Python packages (reproducible when a seed is set)?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 07 Jan 2022 20:46:10 GMT</pubDate>
    <dc:creator>marathon2</dc:creator>
    <dc:date>2022-01-07T20:46:10Z</dc:date>
    <item>
      <title>Reproducibility of the results - Hp Clus Procedure in SAS Enterprise Miner</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/787953#M9016</link>
      <description>&lt;P&gt;I would like to ask about reproducibility problem with HP Clus procedure in SAS Enterprise Miner. I can not reproduce my results when I run the algorithm again with the same seed and same hyper-parameters. I have also tried saving the diagram as .xml and the whole path as SAS code and running them again, but I still didn't get the same results.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to ask the following:&lt;/P&gt;&lt;P&gt;- Is there anyway to reproduce the same clustering solution from HP Clus procedure in SAS Enterprise Miner?&lt;/P&gt;&lt;P&gt;- To what extent it is tolerable not having/reproducing exactly the same results from HP procedures, like is there any acceptable solution for having different clustering solutions from the HP Clus algorithms started with the same seeds and same hyper-parameters?&lt;/P&gt;&lt;P&gt;- Does SAS provide any solution for the reproducibility problems for HP procedures in SAS Enterprise Miner?&lt;/P&gt;&lt;P&gt;- Is the problem of reproducibility in HP procedures due to the parallel processing computation infrastructure ?&lt;/P&gt;</description>
      <pubDate>Fri, 31 Dec 2021 18:51:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/787953#M9016</guid>
      <dc:creator>Hss_45</dc:creator>
      <dc:date>2021-12-31T18:51:48Z</dc:date>
    </item>
    <item>
      <title>Re: Reproducibility of the results - Hp Clus Procedure in SAS Enterprise Miner</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/787987#M9017</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think the issue with reproducibility is indeed linked to&amp;nbsp;&lt;SPAN style="font-family: inherit;"&gt;multithreaded and / or distributed computing.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;You could try to run the HPCLUS procedure this way :&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options cpucount=1 NOTHREADS;

PROC HPCLUS data=;
...;
performance nodes=0 NTHREADS=1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;To evaluate the different clustering results and check if they are "overlapping" enough, you can use the techniques described in this paper :&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;SAS Globale Forum 2019 --&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;Paper 3409-2019&lt;BR /&gt;How to Evaluate Different Clustering Results?&lt;BR /&gt;Ralph Abbey, SAS Institute Inc.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3409-2019.pdf" target="_blank"&gt;https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3409-2019.pdf&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;Cheers,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;Koen&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Jan 2022 15:17:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/787987#M9017</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2022-01-01T15:17:04Z</dc:date>
    </item>
    <item>
      <title>Re: Reproducibility of the results - Hp Clus Procedure in SAS Enterprise Miner</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/788961#M9021</link>
      <description>&lt;P&gt;Your reply doesn't really address OP's main concern.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are you confirming that one should expect&amp;nbsp;&lt;SPAN&gt;HPCLUS&amp;nbsp;&lt;/SPAN&gt;PROC to produce different clustering results in different runs, even if the same seed is set&lt;SPAN&gt;? If so, how much should one expect the results to change from run to run (does the HPCLUS implementation have some nice statistical convergence properties despite differing results)?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Is there ANY way to ensure that PROC HPCLUS results are reproducible in different sessions given a fixed seed (in Enterprise Miner)? &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I understand that PROC HPCLUS could use distributed computing/parallel processing and all that. However, the fact that a seed (or other parameters) doesn't guarantee&amp;nbsp;the same outputs in different runs might sound concerning for the management.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;If you were an analyst, how would you convince to your manager to use SAS PROC HPCLUS (not reproducible) over R/Python packages (reproducible when a seed is set)?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jan 2022 20:46:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/788961#M9021</guid>
      <dc:creator>marathon2</dc:creator>
      <dc:date>2022-01-07T20:46:10Z</dc:date>
    </item>
    <item>
      <title>Re: Reproducibility of the results - Hp Clus Procedure in SAS Enterprise Miner</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/789006#M9022</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes,&lt;/P&gt;
&lt;P&gt;I would say that &lt;SPAN class="cs53F207AF"&gt;observing (very) small differences is not unexpected with the&amp;nbsp;&lt;SPAN&gt;SAS® Enterprise Miner™ High-Performance Procedures (like HPFOREST, HPCLUS, ...), even if the same seed is used.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="cs53F207AF"&gt;&lt;SPAN&gt;The reason for the difference is the random variation that is associated with multi-threading.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs53F207AF"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs53F207AF"&gt;You can get 100% reproducible results by disabling multi-threading, by specifying&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs53F207AF"&gt;&lt;SPAN&gt;&lt;FONT face="courier new,courier"&gt;performance nthreads=1;&lt;/FONT&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;[ NOTE: The SAS system options THREADS | NOTHREADS apply to the client machine on which the&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;SAS high-performance analytical procedures execute. They do not apply to the compute nodes in a&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;distributed environment. ]&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;&lt;SPAN&gt;If you prefer to have repeatability | reproducibility&amp;nbsp;over performance, then try NTHREADS=1 until you encounter a situation in which doing so is not a practical solution. &amp;nbsp;At that time, you can remove the NTHREADS=1 specification and take advantage of multi-threading.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;&lt;SPAN&gt;I have no access to Enterprise Miner anymore (using VIYA Model Studio now), so I do not know about the equivalent for&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;&lt;SPAN&gt;&lt;FONT face="courier new,courier"&gt;performance nthreads=1;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;&lt;SPAN&gt;in Enterprise Miner properties banner.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;&lt;SPAN&gt;Anyway, k-means (HPCLUS algorithm) is a very special case. If you shuffle the observations (i.e. change the order), you will also get different results. But that's inherent to the k-means algorithm and how initial seeds are chosen.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="cs6FE56D2"&gt;&lt;SPAN class="cs53F207AF"&gt;&lt;SPAN&gt;Koen&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 08 Jan 2022 00:40:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Reproducibility-of-the-results-Hp-Clus-Procedure-in-SAS/m-p/789006#M9022</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2022-01-08T00:40:55Z</dc:date>
    </item>
  </channel>
</rss>

