BookmarkSubscribeRSS Feed
Frank_Zhao
Calcite | Level 5

I am fitting spline regression models using GLMSELECT.  The code looks like this:

PROC GLMSELECT DATA=DATESET;

CLASS  X;

EFFECT SPL=SPLINE(X/SPLIT DEGREE=3);

MODEL Y = X SPL/SELECTION=STEPWISE(CHOOSE=CV SELECT=SBC) CVMETHOD=INDEX(GROUP);

BY W Z;

RUN;

Therefore, I will get around 200 models through the BY statement.

Now I want to summarized my results and want to produce a table contains information from each model.  So my questions are:

1. I would like a table containing all the models and the variables used in each model.  Can I produce a table with model name as the column and variables as the row?

2. How to output the RMSE, Coefficient Variation and other statistics in a table for all models.

3. By the way, I need to know what is the difference between CHOOSE = and SELECT =.  In Proc Reg, only Select = is enough to select best model.  How does CHOOSE= work in GLMSELECT procedure?  By the reading, it seems that SELECT= will produce some models not one?  How to understand it?  If I want to get the best predictive models, should I set CHOOSE=CV and SELECT=CV?

Thank you very much.

7 REPLIES 7
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The following may help you to learn about CHOOSE, etc.

http://www2.sas.com/proceedings/sugi31/207-31.pdf

Frank_Zhao
Calcite | Level 5

Thank you, Dear Ivm,

Can you also  take a look at the first two questions?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You will have to look at the ODS OUTPUT tables (there is a list in the User's Guide right before the examples). Each table in the output corresponds to a different file that can be saved with the ods output statement. You may need to merge two or more of these in a post-model fitting step in a data statement. I don't have a specific example here because I haven't used GLMSELECT a lot. With a BY statement, these output files will be stacked with the results for each group identified.

But I have some serious concerns about the model you are fitting. It looks like you are trying to decide if one should use a linear model in X or a cubic spline, or both for each group (essentially trying to see if there is curvilinearity?). Your use of the SPLIT option will consider each term of the spline (knot) as separate terms. With splines, the individual terms don't mean too much (they are arbitrary in order to get predictions that do mean something). Ending up with the second of four (or whatever) terms in a spline is pretty meaningless. I would take out the SPLIT; that way, the spline will be considered as a single term in the model. I also don't see a point of treating X as a factor (CLASS statement). This can create very strange spline basis functions (with many many terms). I didn't even think this would work until I tried it now on some data. I think the results are meaningless. Plus, with X as a factor in your example, X itself will capture any nonlinearity, leaving nothing for the spline function to represent.

Frank_Zhao
Calcite | Level 5

Thank you very much.  I also concern if split is really needed in my case. Do you mean I can use SEPARATE if I have more than one variables in spline(x y z)?

By the way,  the code is not exactly same as what I use. I did not use X as indicator and numeric variable at the same time.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

My view is that you should treat the spline as one term.

Frank_Zhao
Calcite | Level 5

I SEE.  THANK YOU.

Frank_Zhao
Calcite | Level 5

One more question:

If I use  /knotmethod = Multiscale split  in my code,  I got spli_Temperature_S0:5 in my model.   I know it means the 5th basis on 0 scale.  I am confusing.  By reading the online material in SAS website,  there would be 2^i basis in scale i. Therefore, if scale = 0, the basis should be 1. What 5 in spl_Temperature_S0:5 means?

Thank you

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1624 views
  • 0 likes
  • 2 in conversation