<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to build a model within 11,651 variables? in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155437#M40808</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Partial Least Squares Regression was designed for this case. It is often used when the number of variables far exceeds the number of data points, for example, in spectroscopy, where you might have measured the intensity at 10,000 wavelengths on 179 samples. You can find lots of examples in the literature of PLS models for spectroscopy that were similar to your case with many times more X variables than observations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;With PLS, you get data reduction in the sense that it will find linear combinations of your X variables that are predictive of Y. Better than PCA or factor analysis, where you get lienar combinations of your X variables that might just be non-predictive since the Y values are not used in determining the PCA/factor analysis dimensions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But PLS won't give you individual X variables that are in the model, it's not designed to, and as other have pointed out, there is no logical way to pick individual Xs from your 11,651 Xs that are in the model. So give up this idea of using stepwise regression.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 19 Dec 2014 19:09:38 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2014-12-19T19:09:38Z</dc:date>
    <item>
      <title>How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155426#M40797</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a .txt dataset which has 11,653 variables and 179 observations (see attachment). The variable "name" is just the id, I don't need it. The variable "score" is the responsable variable and the remaining 11,651 variables are the explanatory variables. I know those 11,651 variables are not all important, so I need to select the most significant ones to build and fit a model. After I import the .txt file into SAS, I tried stepwise regression in PROC HPREG, but SAS reported insufficient memory. I cannot change the MEMSIZE option (now is 2G) casue I am running the code on the server in my univerisity.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here is my code:&lt;/P&gt;&lt;P&gt;proc hpreg data = P2222;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; model score = P12050301--P60598281;&amp;nbsp;&amp;nbsp; *P12050301 is the first explanatory variable, P60598281 is the last explanatory variable;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; selection method = stepwise(select=sl sle=0.25 sls=0.25 maxeffects=170);&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Do I have other methods to select the best model? I am totally a beginner. Thank you for your help!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Elaine&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Dec 2014 22:55:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155426#M40797</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-17T22:55:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155427#M40798</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You need to reduce your variables before you do a regression. &lt;/P&gt;&lt;P&gt;Ideally you'd know something about all your variables and then you could apply business knowledge as well as statistical techniques. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;One way is to use forward selection (generally not recommended) which is to regress each variable against the dependent and only use those that are significant. You also need to check for correlation between your independent variables. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You'll also need to consider the variable types, i.e. categorical, numerical, ordinal and treat them appropriately in your variable selection method.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Because you have 179 variables you'll want AT LEAST LESS THAT that in your regression, otherwise you have a dimensionality problem and your matrices won't invert. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Dec 2014 23:18:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155427#M40798</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-12-17T23:18:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155428#M40799</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I don't know the solution to the performance issue here, but...&lt;/P&gt;&lt;P&gt;If your explanatory variables (&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13.3333330154419px; background-color: #ffffff;"&gt;&lt;STRONG&gt;P12050301--P60598281&lt;/STRONG&gt;) were independent random numbers, unrelated to &lt;STRONG&gt;score&lt;/STRONG&gt;, you would &lt;EM&gt;certainly&lt;/EM&gt; find a model that fits your data perfectly. Fitting that model is almost like trying to solve N linear equations for N variables.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13.3333330154419px; background-color: #ffffff;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13.3333330154419px; background-color: #ffffff;"&gt;Try setting apart a small set of, say, 20 observations. Build your model based on the rest of the data and then test the resulting model on the small set. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Dec 2014 23:29:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155428#M40799</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2014-12-17T23:29:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155429#M40800</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Try data reduction techniques to identify variables explaining most of the variation before proceeding to modeling. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Dec 2014 02:03:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155429#M40800</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-12-18T02:03:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155430#M40801</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi, thank you for your reply. I understand what you mean here. All my variables are numerical. I don't know something about those variables so I cannot select them using prior knowledge. That's why I use stepwise regression and let SAS select the important ones. But right now because of the memory probelm, I cannot get the result.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Dec 2014 19:46:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155430#M40801</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-18T19:46:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155431#M40802</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi, thank you for your reply. I may not quite understand you. I tried the stepwise regression but SAS reported insufficient memory. This means the explanatoy variables are too many (11,651), so SAS don't have enough memory for this proc. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did you mean I separate the dataset into several small ones and build a model on each of the small dataset? Actually I tried it. I sepatated the whole dataset into 2 small ones, the first one has the first 5,826 explanatory variables and the second one has the remaining 5,825 variables. Then I run the stepwise regression on both of them and built 2 models. The first model selected 10 variables and R square=0.24, the second model selected 155 variables and R square=1.0. But if I separte the whole dataset into different way, say the first 5,000 variables and then 6,651 variables, those model selected different variables and R square is different.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now I am really confusing about it. How can I separate the dataset and get the "best" result?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Elaine&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Dec 2014 19:58:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155431#M40802</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-18T19:58:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155432#M40803</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You mean reduction methods like PCA? I used stepwise regression, don't SAS will select the important variables automatically?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Elaine&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Dec 2014 19:59:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155432#M40803</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-18T19:59:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155433#M40804</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Yes, Factor analysis/PCA will provide few dimensions that can be used in stepwise regression.&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Dec 2014 21:11:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155433#M40804</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-12-18T21:11:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155434#M40805</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Elaine,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Divide your dataset the other way by creating two classes of observations (not variables) with a &lt;STRONG&gt;weight&lt;/STRONG&gt; variable:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;data splitData;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;set myData;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;if rand("UNIFORM") &amp;lt; (21/179) then weight = 0;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;else weight = 1;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;proc reg data=splitData;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;model score = ... ;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;weight weight;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;output out=outData p=predScore;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;proc corr data=outData(where=(weight=0)) Pearson;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;var score; with predScore;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;proc sgplot data=outData&lt;SPAN style="font-size: 13.3333330154419px;"&gt;(where=(weight=0))&lt;/SPAN&gt;;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scatter x=score y=predScore;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"&lt;SPAN style="font-size: 13.3333330154419px; background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;&lt;EM&gt;the second model selected 155 variables and R square=1.0&lt;/EM&gt;&lt;/SPAN&gt;". This is what I warned you about. Contrary to intuition, you have very little chance of creating a meaningful model from that many variables via a variable selection method. The biggest challenge you face is your lack of knowledge about these variables. They can probably be grouped into closely related subsets that could each be represented by one &lt;EM&gt;best&lt;/EM&gt; variable or a single Principal Component.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Dec 2014 21:16:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155434#M40805</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2014-12-18T21:16:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155435#M40806</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi PG,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is an interesting way. Just to learn more about your recommended method, could you please provide more info on this, like how are getting 155 variables?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Naeem&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Dec 2014 21:34:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155435#M40806</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-12-18T21:34:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155436#M40807</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I like thinking about this one in the following way:&amp;nbsp; The OP measured over 11K variables about 180 times, so we have nearly 60 times as many variables as cases.&amp;nbsp; You can RANDOMLY select any 179 of the variables and get a PERFECT fit to the response variable. In fact there are (thank you Wolfram Alpha) :&lt;/P&gt;&lt;P&gt;176958671964634377740198691965577713682480110474545351945677139031026937863665740570836480756356258632972678963277&lt;/P&gt;&lt;P&gt;474969010136808169597920994461712218986603493340022182644089536811284759493681290314266912794597881035605378960409&lt;/P&gt;&lt;P&gt;924200421757503946290949030028695642411309948914571542814736217126680552692397498583913574843384467106219631759603&lt;/P&gt;&lt;P&gt;05286963384521740723958639266557563907788742389725738156000 possible perfect fits (that's 1.78E&lt;SPAN style="text-decoration: underline;"&gt;&lt;STRONG&gt;400&lt;/STRONG&gt;&lt;/SPAN&gt;), and stepwise regression will pick one of those.&amp;nbsp; Given that the total number of protons and electrons in the universe is on the order of 300 orders of magnitude less than this, you may consider stepwise methods as an exercise in futility.&amp;nbsp; Finding the "right" one is impossible in P Log P time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So the best I could think of is to include the response variable into the mix and look at principal components.&amp;nbsp; Find those that have large loadings on the dependent variable, and explain most of the variability.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Actually, the best idea is to find a subject matter expert and whittle the 11K plus variables down to, say, 8 or 10, which is about how many your 179 cases can accurately estimate.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 18:53:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155436#M40807</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2014-12-19T18:53:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155437#M40808</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Partial Least Squares Regression was designed for this case. It is often used when the number of variables far exceeds the number of data points, for example, in spectroscopy, where you might have measured the intensity at 10,000 wavelengths on 179 samples. You can find lots of examples in the literature of PLS models for spectroscopy that were similar to your case with many times more X variables than observations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;With PLS, you get data reduction in the sense that it will find linear combinations of your X variables that are predictive of Y. Better than PCA or factor analysis, where you get lienar combinations of your X variables that might just be non-predictive since the Y values are not used in determining the PCA/factor analysis dimensions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But PLS won't give you individual X variables that are in the model, it's not designed to, and as other have pointed out, there is no logical way to pick individual Xs from your 11,651 Xs that are in the model. So give up this idea of using stepwise regression.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 19:09:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155437#M40808</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2014-12-19T19:09:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155438#M40809</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;True!&amp;nbsp; And it can incorporate &lt;A __default_attr="2746" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt;'s work regarding validation, using the CROSSVAL options.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I was still afraid that 11K right-hand side variables could overwhelm even PLS, but given your endorsement for that size, I will defer.&amp;nbsp; It is the method for use.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And if the OP is interested in variable reduction, they can use the output as an exploratory tool in identifying strongly determining factors and strongly redundant variables.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 19:16:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155438#M40809</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2014-12-19T19:16:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155439#M40810</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Well, I didn't say the OP has enough hardware resources, but that's a different issue. PLS doesn't require inverting a matrix if you use the NIPALS Algorithm (which is the default in PROC PLS), so it doesn't really require huge amounts of memory, and its pretty fast. If her machine can handle it, then that's the way to go.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 19:27:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155439#M40810</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2014-12-19T19:27:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155440#M40811</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I did try PCA and used the first 160 components. The MSE is 0.0106. But I didn't get the coefficient of these components and what are these components. I got the results but just don't know how to explain it.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 21:01:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155440#M40811</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-19T21:01:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155441#M40812</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi PG,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks again for your reply. Yes you are right. There are so many variables and I don't have enough knowledge about them. So simply using regression didn't give me good results.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried decision tree and PCA today. Decison tree selected 15 variables out of 11,651 and MSE is 0.139. For PCA method, there are 160 components in the model and MSE is 0.0106. However, I didn't get the coefficient of these components, so I don't know how to explain it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You talked about dividing the original dataset into 2 small ones by separating the observations. One is used for building a model, the other is used for testing. Why can't just use all the observations to build the model and then test it? Will that make the result more accurate and reasonable? And what is weight meaning and what its purpose?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best wishes!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Elaine&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 22:09:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155441#M40812</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-19T22:09:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155442#M40813</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks so much for your reply. You just gave me so much information and I need some time to understand it since I am totally totally a new user. At least I know stepwise is not the solution. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;BTW, what is OP short for?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 23:15:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155442#M40813</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-19T23:15:58Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155443#M40814</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Thanks so much for your reply. You just gave me so much information and I need some time to understand it since I am totally totally a new user. At least I know stepwise is not the solution.:)&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 23:17:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155443#M40814</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-19T23:17:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155444#M40815</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;OP is short for Original Poster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm not a statistician, but have taken numerous statistic's courses.&amp;nbsp; I think that factor analysis, or PCP, will be a good first step to reduce your number of variables.&amp;nbsp; I.e., you could create combined scores that take into account collections of grouped measures. However, that would require business knowledge to determine whether the results (i.e., what to combine) seem to make sense.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG's proposal was simply to create a model on a sample of your data and see if it held up on another sample of your data.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 23:22:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155444#M40815</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2014-12-19T23:22:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to build a model within 11,651 variables?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155445#M40816</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Yes that's what I am thinking now. Do the variable reduction first. I tried PCA today and &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;there are 160 components in the model where MSE is 0.0106. However, I didn't get the coefficient of these components, so I don't know how to explain it.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I guess what PG means is the N fold cross-validation.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Elaine&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 23:36:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-build-a-model-within-11-651-variables/m-p/155445#M40816</guid>
      <dc:creator>call_me_elaine</dc:creator>
      <dc:date>2014-12-19T23:36:04Z</dc:date>
    </item>
  </channel>
</rss>

