<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to split data into train and test sets, and use the model built from train set to predict data in test set? in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183834#M46756</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="font-size: 13.63636302948px;"&gt;Hi! I am a junior SAS analyst.&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;I intend to split data into train and test sets, and use the model built from train set to predict data in test set, the number of observation is up to 50000 or more.&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;the easiest way that I think of is to use the syntax "PROC SURVEYSELECT" to random-sample observations from whole data. For example, &lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;I may ask SAS to random-sample 30% as test set, (and the rest 70% is train set):&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;PROC SURVEYSELECT DATA=whole.data OUT=test.set METHOD=srs SAMPRATE=0.3;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;RUN;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;Now, I have a test set in the dataset: 'test.set', however:&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;1.how could I create a dataset (e.g. 'train.set') to accommodate the rest 70% data?&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;2.After using 'train.set' to build a predictive model&amp;nbsp; (e.g. linear model), how could I use this model built in the 'train.set' to &lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&amp;nbsp; predict data in the 'test.set'? and let the output revealing every predicted value and residual?&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;Thanks for your patience!&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;David&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 24 Nov 2014 15:19:18 GMT</pubDate>
    <dc:creator>DavidWang</dc:creator>
    <dc:date>2014-11-24T15:19:18Z</dc:date>
    <item>
      <title>How to split data into train and test sets, and use the model built from train set to predict data in test set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183834#M46756</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="font-size: 13.63636302948px;"&gt;Hi! I am a junior SAS analyst.&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;I intend to split data into train and test sets, and use the model built from train set to predict data in test set, the number of observation is up to 50000 or more.&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;the easiest way that I think of is to use the syntax "PROC SURVEYSELECT" to random-sample observations from whole data. For example, &lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;I may ask SAS to random-sample 30% as test set, (and the rest 70% is train set):&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;PROC SURVEYSELECT DATA=whole.data OUT=test.set METHOD=srs SAMPRATE=0.3;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;RUN;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;Now, I have a test set in the dataset: 'test.set', however:&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;1.how could I create a dataset (e.g. 'train.set') to accommodate the rest 70% data?&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;2.After using 'train.set' to build a predictive model&amp;nbsp; (e.g. linear model), how could I use this model built in the 'train.set' to &lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&amp;nbsp; predict data in the 'test.set'? and let the output revealing every predicted value and residual?&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;Thanks for your patience!&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px;"&gt;David&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 24 Nov 2014 15:19:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183834#M46756</guid>
      <dc:creator>DavidWang</dc:creator>
      <dc:date>2014-11-24T15:19:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to split data into train and test sets, and use the model built from train set to predict data in test set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183835#M46757</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just add outall in the syntax to create a dataset all that adds a flag variable "selected" which is 1 for test sample and 0 for remaining observations which may be considered as training set. So you can use selected=0 as a training dataset for the model development and selected=1 for testing.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;PROC SURVEYSELECT DATA=whole.data &lt;STRONG&gt;outall&lt;/STRONG&gt; OUT=all METHOD=srs SAMPRATE=0.3;&lt;/P&gt;&lt;P style="font-size: 13.63636302948px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;RUN;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 24 Nov 2014 15:42:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183835#M46757</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-11-24T15:42:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to split data into train and test sets, and use the model built from train set to predict data in test set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183836#M46758</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi! Thanks for your prompt reply!!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But I still have some questions:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1.How to make "a flag variable: selected"? and assign values '1' and '0'?&lt;/P&gt;&lt;P&gt;2.Is 'outall' a syntax or just a nominal name?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If convenience, hope that you can share the detailed procedures.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sorry, I am not accustomed to data management.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Many thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;David&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 24 Nov 2014 15:57:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183836#M46758</guid>
      <dc:creator>DavidWang</dc:creator>
      <dc:date>2014-11-24T15:57:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to split data into train and test sets, and use the model built from train set to predict data in test set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183837#M46759</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just try the syntax given above. Flag variable "selected" will be created in the data set "all". Outall is part of syntax and "all" is the resultant data set.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 24 Nov 2014 16:09:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183837#M46759</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-11-24T16:09:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to split data into train and test sets, and use the model built from train set to predict data in test set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183838#M46760</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I am so glad for your kindness!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;David&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 24 Nov 2014 16:29:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183838#M46760</guid>
      <dc:creator>DavidWang</dc:creator>
      <dc:date>2014-11-24T16:29:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to split data into train and test sets, and use the model built from train set to predict data in test set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183839#M46761</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have successfully split the whole data into two parts: train set and test set, and I use the syntax&lt;/P&gt;&lt;P&gt;PROC FREQ to check whether they are split as the proportion I need, and it's done! Thanks &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now, I have used the train set (only 'selected=0' data are used) to build a linear model, and estimate the BETAs,&lt;/P&gt;&lt;P&gt;however, I do not know how to use this selected MODEL to predict data in the test set? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;IN BRIEF, how to use a selected model to predict (or validate) data in test set?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;warm regards&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;David&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 25 Nov 2014 04:08:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-split-data-into-train-and-test-sets-and-use-the-model/m-p/183839#M46761</guid>
      <dc:creator>DavidWang</dc:creator>
      <dc:date>2014-11-25T04:08:18Z</dc:date>
    </item>
  </channel>
</rss>

