<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why the output of the proc hpsplit is uncertain in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743164#M36164</link>
    <description>&lt;P&gt;No, there's no general rule to set the seed.&lt;/P&gt;
&lt;P&gt;It can be any strictly positive (&amp;gt;0) number.&lt;/P&gt;
&lt;P&gt;You just set the seed to get a reproducible result.&lt;/P&gt;
&lt;P&gt;But the seed is / should not be important for the final model. I mean, whatever the seed, the resulting models will always be very comparable (not identical but very comparable). At least this is the case if these are good models that capture well the underlying pattern in the data. Hence, the seed is not an important factor, many people just use 12345.&lt;/P&gt;
&lt;P&gt;If different seeds result in very different models there's a problem somewhere I would say!!&lt;/P&gt;
&lt;P&gt;Of course if there's one family of models that could "suffer" a bit from this seed-selection it is TREES because their response surface is so discrete (not smooth). When your age is X-years minus one day you branch to the left and if your age is X-years you branch to the right and both cases might end up in leaves with a significant difference in (predicted) response value.&lt;/P&gt;
&lt;P&gt;Kind regards,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
    <pubDate>Sat, 22 May 2021 19:01:45 GMT</pubDate>
    <dc:creator>sbxkoenk</dc:creator>
    <dc:date>2021-05-22T19:01:45Z</dc:date>
    <item>
      <title>Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743130#M36158</link>
      <description>&lt;P&gt;I run the following code several times and got different output. The SAS/STAT version is 15.1. Does the nodestats= is incompatible with this version?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc hpsplit data=train leafsize=2213;&lt;BR /&gt;model loan_status =mths_since_last_delinq;&lt;BR /&gt;output nodestats=hp_tree;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 22 May 2021 14:11:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743130#M36158</guid>
      <dc:creator>su35</dc:creator>
      <dc:date>2021-05-22T14:11:01Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743131#M36159</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Which version of SAS are you using? Find out by submitting:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%PUT &amp;amp;=sysvlong;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suppose you will get always the same result if you specify a seed:&lt;/P&gt;
&lt;DIV class="xis-refProc"&gt;
&lt;DIV id="stathpug.hpsplit.proc_stmt" class="AAsection"&gt;
&lt;DIV id="stathpug.hpsplit.tab_proc"&gt;
&lt;DIV class="-contents"&gt;
&lt;TABLE class="AAtabular"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P data-unlink="true"&gt;SEED=&amp;nbsp;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;Specifies the random number seed to use for cross validation&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;like&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc hpsplit data=train leafsize=2213 seed=1014;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Sat, 22 May 2021 14:29:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743131#M36159</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-05-22T14:29:36Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743141#M36160</link>
      <description>Thanks Koen.&lt;BR /&gt;Your solution is work. But, when I tried different seeds, such as 1234, I got different results. So, what is the role of seed options?  What is the rule of select seed?&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Jun&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 22 May 2021 17:13:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743141#M36160</guid>
      <dc:creator>su35</dc:creator>
      <dc:date>2021-05-22T17:13:16Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743155#M36161</link>
      <description>&lt;P&gt;Hello &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/338831"&gt;@su35&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is the general definition for a seed in SAS.&lt;/P&gt;
&lt;DIV class="xis-glossary"&gt;
&lt;DL class="xis-glossaryTermDefPair"&gt;
&lt;DT id="n0hhmp679arov8n1st80c3m8xxz3" class="xis-glossaryTerm"&gt;&lt;FONT&gt;seed =&amp;nbsp;&lt;/FONT&gt;an initial value from which &lt;FONT&gt;a&lt;/FONT&gt; random number function or CALL routine calculates &lt;FONT&gt;a&lt;/FONT&gt; random value.&lt;/DT&gt;
&lt;DD class="xis-glossaryDefinition"&gt;&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;In &lt;EM&gt;k&lt;/EM&gt;-fold cross-validation (used in HPSPLIT) the data have to be split in &lt;EM&gt;k&lt;/EM&gt; distinct&amp;nbsp;sets with (about) equal n° of observations.&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;( I don't know about the exact value of &lt;EM&gt;k&lt;/EM&gt; in HPSPLIT. ,&amp;nbsp;it's&amp;nbsp;&lt;SPAN&gt;not relevant to your question&lt;/SPAN&gt; )&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;This data split in &lt;EM&gt;k&lt;/EM&gt; sets is done using a (pseudo-) random number generator.&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;The (pseudo-) random number generator uses a strictly positive seed for initialization.&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;Using the same seed ensures reproducibility of the random number series, using a different seed results in a different set of random numbers.&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;Using NO seed means the seed will default to the computer clock time which is always different for consecutive runs. That's why you got different results for PROC HPSPLIT in subsequent runs when not using a seed.&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;&lt;/DD&gt;
&lt;DD class="xis-glossaryDefinition"&gt;&lt;SPAN style="font-family: inherit;"&gt;Kind regards,&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;Koen&lt;/SPAN&gt;&lt;/DD&gt;
&lt;/DL&gt;
&lt;/DIV&gt;</description>
      <pubDate>Sat, 22 May 2021 18:21:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743155#M36161</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-05-22T18:21:58Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743159#M36162</link>
      <description>If the result is dependent on the seed, is there a general rule to set the seed?</description>
      <pubDate>Sat, 22 May 2021 18:39:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743159#M36162</guid>
      <dc:creator>su35</dc:creator>
      <dc:date>2021-05-22T18:39:46Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743161#M36163</link>
      <description>odd/even?</description>
      <pubDate>Sat, 22 May 2021 18:46:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743161#M36163</guid>
      <dc:creator>su35</dc:creator>
      <dc:date>2021-05-22T18:46:24Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743164#M36164</link>
      <description>&lt;P&gt;No, there's no general rule to set the seed.&lt;/P&gt;
&lt;P&gt;It can be any strictly positive (&amp;gt;0) number.&lt;/P&gt;
&lt;P&gt;You just set the seed to get a reproducible result.&lt;/P&gt;
&lt;P&gt;But the seed is / should not be important for the final model. I mean, whatever the seed, the resulting models will always be very comparable (not identical but very comparable). At least this is the case if these are good models that capture well the underlying pattern in the data. Hence, the seed is not an important factor, many people just use 12345.&lt;/P&gt;
&lt;P&gt;If different seeds result in very different models there's a problem somewhere I would say!!&lt;/P&gt;
&lt;P&gt;Of course if there's one family of models that could "suffer" a bit from this seed-selection it is TREES because their response surface is so discrete (not smooth). When your age is X-years minus one day you branch to the left and if your age is X-years you branch to the right and both cases might end up in leaves with a significant difference in (predicted) response value.&lt;/P&gt;
&lt;P&gt;Kind regards,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Sat, 22 May 2021 19:01:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743164#M36164</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-05-22T19:01:45Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743169#M36165</link>
      <description>I use the proc hpsplit to discretize the interval variables and collapsing the levels of the ordinal and nominal variables. Run the following code&lt;BR /&gt;proc hpsplit data=train leafsize=2213 seed=;&lt;BR /&gt;model loan_status =mths_since_last_delinq;&lt;BR /&gt;output nodestats=hp_tree;&lt;BR /&gt;run;&lt;BR /&gt;if seed=1113, then the mths_since_last_delinq would be splited to 7 bin. if seed=1111, then the mths_since_last_delinq couldn't split.&lt;BR /&gt;Regards,&lt;BR /&gt;Jun</description>
      <pubDate>Sat, 22 May 2021 19:40:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743169#M36165</guid>
      <dc:creator>su35</dc:creator>
      <dc:date>2021-05-22T19:40:03Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743219#M36166</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/338831"&gt;@su35&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That's very weird.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It may happen exceptionally (this 'big' discrepancy between results), but the fact that you just bump into 2 random seeds where this happens is remarkable. Are you sure everything is OK with the data? Do you have enough observations?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anyway, I would get rid of the cross-validation (CV) as your goal is to just discretize one interval variable (or collapse levels of 1 nominal / ordinal variable). Without CV there's not a seed in the game:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC HPSPLIT CVMETHOD=NONE ...;
...
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Cheers,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Sun, 23 May 2021 14:00:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743219#M36166</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-05-23T14:00:49Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743287#M36167</link>
      <description>&lt;P&gt;From documentation on using random number functions :&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;DIV id="p026ygl6toz3tgn14lt4iu6cl5bb" class="xis-topic"&gt;
&lt;DIV id="n1nhvm9m4y798sn1rvwg8zldva5o" class="xis-subTopic"&gt;
&lt;H2 class="xis-title"&gt;Seed Values&lt;/H2&gt;
&lt;DIV id="n0tucvwcubp1y1n1v92tsk6boa9d" class="xis-topicContent"&gt;
&lt;DIV id="n06oftv99oj0mzn1krjq0f8jpoqt" class="xis-paragraph"&gt;Random-number functions and CALL routines generate streams of pseudo-random numbers from an initial starting point, called a &lt;SPAN class="xis-userSuppliedValue"&gt;seed&lt;/SPAN&gt;, that either the user or the computer clock supplies. A seed must be a nonnegative integer with a value less than 2&lt;SUP&gt;31&lt;/SUP&gt;–1 (or 2,147,483,647). If you use a positive seed, you can always replicate the stream of random numbers by using the same DATA step. If you use zero as the seed, the computer clock initializes the stream, and the stream of random numbers cannot be replicated.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Which value to set is &lt;STRONG&gt;your &lt;/STRONG&gt;decision.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 May 2021 02:29:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743287#M36167</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-05-24T02:29:21Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743288#M36168</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/338831"&gt;@su35&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;I use the proc hpsplit to discretize the interval variables and collapsing the levels of the ordinal and nominal variables. Run the following code&lt;BR /&gt;proc hpsplit data=train leafsize=2213 seed=;&lt;BR /&gt;model loan_status =mths_since_last_delinq;&lt;BR /&gt;output nodestats=hp_tree;&lt;BR /&gt;run;&lt;BR /&gt;if seed=1113, then the mths_since_last_delinq would be splited to 7 bin. if seed=1111, then the mths_since_last_delinq couldn't split.&lt;BR /&gt;Regards,&lt;BR /&gt;Jun&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Show LOG from the run you made where it "couldn't split". Copy the text for the entire Proc HPSPLIT plus any notes, warnings or other messages. Then open a text box on the forum with the &amp;lt;/&amp;gt; icon and paste the text. The text box is important to preserve text formatting of any diagnostics that SAS places in the log. The message windows on this forum reformat text and may make the diagnostics less useful or hard to read properly.&lt;/P&gt;</description>
      <pubDate>Mon, 24 May 2021 02:32:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743288#M36168</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-05-24T02:32:30Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743399#M36175</link>
      <description>&lt;PRE&gt;7877   proc hpsplit data=train leafsize=2213 assignmissing=none seed=1111;
7878   model loan_status =mths_since_last_delinq;
7879   output nodestats=work.hp_tree;
7880   run;

NOTE: The HPSPLIT procedure is executing in single-machine mode.
NOTE: Cross-validating using 10 folds.
NOTE: There were 44249 observations read from the data set LOANRISK.TRAIN.
NOTE: The data set WORK.HP_TREE has 1 observations and 25 variables.
NOTE: PROCEDURE HPSPLIT used (Total process time):
      real time           1.36 seconds
      cpu time            0.92 seconds


7881   proc hpsplit data=train leafsize=2213 assignmissing=none seed=1113;
7882   model loan_status =mths_since_last_delinq;
7883   output nodestats=work.hp_tree;
7884   run;

NOTE: The HPSPLIT procedure is executing in single-machine mode.
NOTE: Cross-validating using 10 folds.
NOTE: There were 44249 observations read from the data set LOANRISK.TRAIN.
NOTE: The data set WORK.HP_TREE has 15 observations and 25 variables.
NOTE: PROCEDURE HPSPLIT used (Total process time):
      real time           1.36 seconds
      cpu time            1.00 seconds
&lt;/PRE&gt;
&lt;P&gt;From above, we can see that when seed=1111, the work.hp_tree is one obs. But when the seed=1113, there are 15 obs in work.hp_tree.&lt;/P&gt;</description>
      <pubDate>Mon, 24 May 2021 17:08:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743399#M36175</guid>
      <dc:creator>su35</dc:creator>
      <dc:date>2021-05-24T17:08:42Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743512#M36179</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/338831"&gt;@su35&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You are having enough observations ( # 44249 ).&lt;/P&gt;
&lt;P&gt;What's the cardinality of the input variable "mths_since_last_delinq"? In other words, how many distinct levels (distinct values) does it have? You can find out with PROC FREQ or PROC SQL or PROC CARDINALITY (latter procedure only exists in VIYA, not in SAS 9.4).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Tue, 25 May 2021 10:08:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743512#M36179</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-05-25T10:08:13Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743860#M36191</link>
      <description>The "mths_since_last_delinq" is the counting of months that has 107 distinct levels and 48% missing value. I treat it as an interval value.</description>
      <pubDate>Wed, 26 May 2021 13:11:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743860#M36191</guid>
      <dc:creator>su35</dc:creator>
      <dc:date>2021-05-26T13:11:46Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743955#M36195</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/338831"&gt;@su35&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;OK,&amp;nbsp;&lt;SPAN&gt;107 distinct levels (+1 level for missing, …&amp;nbsp;I guess these are accounts which never had a delinquency) is enough to consider that variable as an interval input.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Very strange that you stumbled upon 2 random seeds which have so different results.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I guess most of the other seeds you can imagine (any number &amp;gt; 0) will result in solution 1 or solution 2, no?&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN&gt;If 10 more seeds give you the split, then that split should be done.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;If these 10 more seeds result in no split, then no split should be done.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;But again you can also work without cross-validation (no seed needed and always the same solution unless you are heavily doing distributed processing, then minor differences might be possible).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Heavy distributed processing, like you can do in SAS VIYA, is not always giving you deterministic results.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Cheers,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Koen&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 May 2021 16:53:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/743955#M36195</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-05-26T16:53:36Z</dc:date>
    </item>
    <item>
      <title>Re: Why the output of the proc hpsplit is uncertain</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/745243#M36300</link>
      <description>Hi Koen,&lt;BR /&gt;I had tried different seeds and got lots of different results. split to 5, 7,...&lt;BR /&gt;For my purpose, I accept your solution: work without cross-validation.&lt;BR /&gt;Thanks&lt;BR /&gt;Jun</description>
      <pubDate>Wed, 02 Jun 2021 17:03:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Why-the-output-of-the-proc-hpsplit-is-uncertain/m-p/745243#M36300</guid>
      <dc:creator>su35</dc:creator>
      <dc:date>2021-06-02T17:03:59Z</dc:date>
    </item>
  </channel>
</rss>

