<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Randomly splitting data for training and data set for conditional logistic regression in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/307890#M16292</link>
    <description>&lt;P&gt;While I wouldn't be surprised if PROC SURVEYSELECT can do this, you can certainly cut down the number of steps:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If not already sorted, start there:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sort data=want;&lt;/P&gt;
&lt;P&gt;by id;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then just a single step will split the data:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data train test;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;by id;&lt;/P&gt;
&lt;P&gt;if first.id then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;if ranuni(12345) &amp;lt; 0.7 then destination = 'train';&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;else destination = 'test';&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;retain destination;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;if destination = 'train' then output train;&lt;/P&gt;
&lt;P&gt;else output test;&lt;/P&gt;
&lt;P&gt;drop destination;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;</description>
    <pubDate>Fri, 28 Oct 2016 11:47:12 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2016-10-28T11:47:12Z</dc:date>
    <item>
      <title>Randomly splitting data for training and data set for conditional logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/307875#M16289</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a big data set for conditional logistic regression where I want to split it into two sets: train and test. &amp;nbsp;Data format as follow:&lt;/P&gt;&lt;P&gt;ID Y X&lt;/P&gt;&lt;P&gt;1 1 10&lt;/P&gt;&lt;P&gt;1 0 12&lt;/P&gt;&lt;P&gt;1 0 13&lt;/P&gt;&lt;P&gt;2 0 20&lt;/P&gt;&lt;P&gt;2 1 5&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;10000 0 11&lt;/P&gt;&lt;P&gt;10000 0 8&lt;/P&gt;&lt;P&gt;10000 1 16&lt;/P&gt;&lt;P&gt;10000 0 14&lt;/P&gt;&lt;P&gt;What I want is randomly pick ID with a ratio say, 7:3 on 10000 ID for train:test, and obtaining all the rows with the same ID.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Menawhile, how can I compute the predicted probability after running proc logistic procedure with strata ID ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your kind assistant.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 10:02:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/307875#M16289</guid>
      <dc:creator>acma</dc:creator>
      <dc:date>2016-10-28T10:02:32Z</dc:date>
    </item>
    <item>
      <title>Re: Randomly splitting data for training and data set for conditional logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/307877#M16290</link>
      <description>&lt;P&gt;First, build a table with distinct IDs&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have (keep=id) out=id nodupkey;
by id;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
create table id as
select distinct id
from have
;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;or, if have is already sorted&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data id;
set have (keep=id);
by if;
if first.id;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Separate that into two datasets:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data train test;
set id;
if rand('uniform') &amp;lt;= 0.3
then output test;
else output train;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then you can merge back into your original dataset.&lt;/P&gt;
&lt;P&gt;Depending on the state of your original dataset, you could create the lookup datasets by combining steps 3 &amp;amp; 4.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 10:29:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/307877#M16290</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-10-28T10:29:27Z</dc:date>
    </item>
    <item>
      <title>Re: Randomly splitting data for training and data set for conditional logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/307890#M16292</link>
      <description>&lt;P&gt;While I wouldn't be surprised if PROC SURVEYSELECT can do this, you can certainly cut down the number of steps:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If not already sorted, start there:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sort data=want;&lt;/P&gt;
&lt;P&gt;by id;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then just a single step will split the data:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data train test;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;by id;&lt;/P&gt;
&lt;P&gt;if first.id then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;if ranuni(12345) &amp;lt; 0.7 then destination = 'train';&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;else destination = 'test';&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;retain destination;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;if destination = 'train' then output train;&lt;/P&gt;
&lt;P&gt;else output test;&lt;/P&gt;
&lt;P&gt;drop destination;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 11:47:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/307890#M16292</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-10-28T11:47:12Z</dc:date>
    </item>
    <item>
      <title>Re: Randomly splitting data for training and data set for conditional logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/308070#M16300</link>
      <description>&lt;PRE&gt;
data have;
 do id=1 to 100;
  do x=1 to 10;
   output;
  end;
 end;
run;

data train test;
 set have;
 by id;
 retain idx;
 if first.id then idx=ceil(100*rand('uniform'));
 if idx le 30 then output test;
  else output train;
drop idx;
run;

&lt;/PRE&gt;</description>
      <pubDate>Sat, 29 Oct 2016 05:45:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/308070#M16300</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-10-29T05:45:56Z</dc:date>
    </item>
    <item>
      <title>Re: Randomly splitting data for training and data set for conditional logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/403232#M21032</link>
      <description>&lt;P&gt;"What I want is randomly pick ID with a ratio say, 7:3 on 10000 ID for train:test, and obtaining all the rows with the same ID."&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can do this directly with PROC SURVEYSELECT now, using the SAMPLINGUNIT statement. For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="SAS Monospace" size="4"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="SAS Monospace" size="4"&gt;surveyselect&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;data&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="4"&gt;=have &lt;/FONT&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;out&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="4"&gt;=want &lt;/FONT&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;method&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="4"&gt;=&lt;/FONT&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;srs&lt;/FONT&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;samprate&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="4"&gt;=&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="SAS Monospace" size="4"&gt;0.70&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; outall&lt;/FONT&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;seed&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="4"&gt;=&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="SAS Monospace" size="4"&gt;12345&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;noprint&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="4"&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="4"&gt;&amp;nbsp; samplingunit&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="4"&gt; id;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="SAS Monospace" size="4"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="4"&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The OUTALL option outputs both the selected and unselected units. The automatic output variable SELECTED equals 1 for the selected units and 0 for the unselected units. In this case, the units are the ID's. 70% of the ID values are randomly selected, and each sample ID includes all the observations for that ID value.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 16:16:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Randomly-splitting-data-for-training-and-data-set-for/m-p/403232#M21032</guid>
      <dc:creator>Zard</dc:creator>
      <dc:date>2017-10-11T16:16:59Z</dc:date>
    </item>
  </channel>
</rss>

