Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
WilliamMunoz
Calcite | Level 5

Hi,

 

I want to tune the hyperparameters of a logistic regression model using a dataset that has a _partind_ column (train='1', validate='0', test='2') and using CAS capabilities.

 

I think PROC LOGSELECT is discarded since it has no AUTOTUNE statement such as PROC GRADBOOST even though it is CAS-enabled.

 

I'm currently trying the Autotune Action Set (as per the example in https://documentation.sas.com/doc/da/pgmsascdc/v_061/casactml/casactml_autotune_examples66.htm). However, documentation is not quite clear to me.

 

Question 1:

Given the following code (which is in the link above)

 proc cas noqueue;
     autotune.tuneLogistic /
         trainOptions={
             table={name='GETSTARTED'},
             class={{vars={'C'}}},
             model={
                 depVars={{name='y'}},
                 effects={
                     {vars={'C', 'x1', 'x2', 'x3', 'x4', 'x5',
                          'x6', 'x7', 'x8', 'x9', 'x10'}}
                 }
             },
             savestate={name="logistic_getstarted_model"}
         }
         tunerOptions={seed=12345}

         /* Tuning Parameters

         You do not need to specify any tuning parameters for the default
         tuning process. If you want to make adjustments to the default
         tuning process, uncomment the following block of code and change
         any of the tuning parameters' attributes.

         tuningParameters={
            {name="method",
               valueList={"BACKWARD", "FORWARD", "LASSO", "NONE", "STEPWISE"},
                                         initValue="STEPWISE",   exclude=false},
           {name="slEntry",    lb=0.01, ub=0.99, initValue=0.05, exclude=false},
           {name="slStay",     lb=0.01, ub=0.99, initValue=0.05, exclude=false},
           {name="stopHorizon",lb=1,    ub=5,    initValue=3,    exclude=false},
           {name="lassoRho",   lb=0.1,  ub=0.9,  initValue=0.8,  exclude=false},
           {name="lassoSteps", lb=10,   ub=100,  initValue=20,   exclude=false}
         }

         */

     ;
     ods output TunerResults      = TuneResults(keep=MisclassErr);
     ods output EvaluationHistory = EvalHistory;
     ods output IterationHistory  = IterHistory;
     run;
  quit;

and considering the tuningParameters={{autotuneTuningParmDefinition-1},...} section of the documentation (https://documentation.sas.com/doc/da/pgmsascdc/v_061/casactml/cas-autotune-tunelogistic.htm#SAS.cas-...), where it is stated that each autotuneTuningParmDefinition-n has a namePath="string" (alias name) such that "it specifies the name path of a tuning parameter. For a nested action parameter, this parameter specifies a dot-separated path that includes all its parent parameter names. For a top-level action parameter, this parameter is simply the name of the parameter." 

 

Is the user supposed to fully specify the parameter that is being referenced from the logistic action ??? If so, how does SAS Viya know that {name="method",...} in the example code above refers to the parameter hierarchy regression.logistic / selection={method="BACKWARD" | "ELASTICNET" | "FORWARD" | "LASSO" | "NONE" | "STEPWISE"} ??? (I assume this is the hyperparameter that was referenced in the example code because of the similarity in options; see https://documentation.sas.com/doc/en/pgmsascdc/v_061/casactstat/cas-regression-logistic.htm#SAS.cas-...).

 

Shouldn't {name="method",...} be {name="selection.method"} because there are multiple other "method" parameters such as regression.logistic / polynomial={{standarize={method="MOMENTS" | "MRANGE" | "WMOMENTS"}, ...}, ...} ???

 

Question 2:

As per the predefined  train-validate-test partition of my dataset, I think I should use userDefinedPartition=TRUE|FALSE "when set to True, includes a user-defined partition for training and scoring."  ( https://documentation.sas.com/doc/da/pgmsascdc/v_061/casactml/cas-autotune-tunelogistic.htm#SAS.cas-... ). But how ??? There is no reference on how to tell SAS what my partition column and values are.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 658 views
  • 0 likes
  • 1 in conversation