turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Extract model information from GLMSELECT procedure

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-21-2015 12:20 PM

I am fitting spline regression models using GLMSELECT. The code looks like this:

PROC GLMSELECT DATA=DATESET;

CLASS X;

EFFECT SPL=SPLINE(X/SPLIT DEGREE=3);

MODEL Y = X SPL/SELECTION=STEPWISE(CHOOSE=CV SELECT=SBC) CVMETHOD=INDEX(GROUP);

BY W Z;

RUN;

Therefore, I will get around 200 models through the BY statement.

Now I want to summarized my results and want to produce a table contains information from each model. So my questions are:

1. I would like a table containing all the models and the variables used in each model. Can I produce a table with model name as the column and variables as the row?

2. How to output the RMSE, Coefficient Variation and other statistics in a table for all models.

3. By the way, I need to know what is the difference between CHOOSE = and SELECT =. In Proc Reg, only Select = is enough to select best model. How does CHOOSE= work in GLMSELECT procedure? By the reading, it seems that SELECT= will produce some models not one? How to understand it? If I want to get the best predictive models, should I set CHOOSE=CV and SELECT=CV?

Thank you very much.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Frank_Zhao

05-21-2015 05:03 PM

The following may help you to learn about CHOOSE, etc.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-21-2015 11:08 PM

Thank you, Dear Ivm,

Can you also take a look at the first two questions?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Frank_Zhao

05-22-2015 12:01 PM

You will have to look at the ODS OUTPUT tables (there is a list in the User's Guide right before the examples). Each table in the output corresponds to a different file that can be saved with the ods output statement. You may need to merge two or more of these in a post-model fitting step in a data statement. I don't have a specific example here because I haven't used GLMSELECT a lot. With a BY statement, these output files will be stacked with the results for each group identified.

But I have some serious concerns about the model you are fitting. It looks like you are trying to decide if one should use a linear model in X or a cubic spline, or both for each group (essentially trying to see if there is curvilinearity?). Your use of the SPLIT option will consider each term of the spline (knot) as separate terms. With splines, the individual terms don't mean too much (they are arbitrary in order to get predictions that do mean something). Ending up with the second of four (or whatever) terms in a spline is pretty meaningless. I would take out the SPLIT; that way, the spline will be considered as a single term in the model. I also don't see a point of treating X as a factor (CLASS statement). This can create very strange spline basis functions (with many many terms). I didn't even think this would work until I tried it now on some data. I think the results are meaningless. Plus, with X as a factor in your example, X itself will capture any nonlinearity, leaving nothing for the spline function to represent.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-22-2015 01:04 PM

Thank you very much. I also concern if split is really needed in my case. Do you mean I can use SEPARATE if I have more than one variables in spline(x y z)?

By the way, the code is not exactly same as what I use. I did not use X as indicator and numeric variable at the same time.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Frank_Zhao

05-22-2015 01:56 PM

My view is that you should treat the spline as one term.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-22-2015 03:29 PM

I SEE. THANK YOU.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-22-2015 11:25 AM

One more question:

If I use /knotmethod = Multiscale split in my code, I got spli_Temperature_S0:5 in my model. I know it means the 5th basis on 0 scale. I am confusing. By reading the online material in SAS website, there would be 2^i basis in scale i. Therefore, if scale = 0, the basis should be 1. What 5 in spl_Temperature_S0:5 means?

Thank you