BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jacob_klimek
Obsidian | Level 7

Hi,

 

I was wondering if I am correct in my finding that I can't produce confidence intervals in the results output window when I use proc glmselect for a regression. I know it is typical procedure to post a question with data and code etc. but I don't really know where to start with this question. The only work around I have thought about is to use proc glmselect to grab variable names and output it to a data set, then write a macro variable that gets filtered into a proc glm command with the clm option specified after the model statement.

 

Any help would be great thanks.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

As stated in the documentation, "PROC GLMSELECT provides results (displayed tables, output data sets, and macro variables) that make it easy to take the selected model and explore it in more detail in a subsequent procedure such as REG or GLM."  However, to get inferential statistics and hypotheses tests, you should select a model and then use a procedure such as PROC GLM.

 

You don't need to create your own macro variables because PROC GLMSELECT already provides one.

See the section "Macro variables containing selected models" for an example of using macro variables for post-selection analysis.  In particular, after you run PROC GLMSELECT, you can use the _GLSIND macro variable to run the selected model in PROC GLM, as shown in this example:

proc glmselect data=sashelp.cars;
class origin;
model mpg_city = MSRP|wheelbase|weight|cylinders|origin @2;
run;

%put &=_GLSIND;

proc glm data=sashelp.cars;
class origin;
model mpg_city = &_GLSIND / solution CLPARM; /* or CLB for PROC REG */
run;

 

 

 

 

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

As stated in the documentation, "PROC GLMSELECT provides results (displayed tables, output data sets, and macro variables) that make it easy to take the selected model and explore it in more detail in a subsequent procedure such as REG or GLM."  However, to get inferential statistics and hypotheses tests, you should select a model and then use a procedure such as PROC GLM.

 

You don't need to create your own macro variables because PROC GLMSELECT already provides one.

See the section "Macro variables containing selected models" for an example of using macro variables for post-selection analysis.  In particular, after you run PROC GLMSELECT, you can use the _GLSIND macro variable to run the selected model in PROC GLM, as shown in this example:

proc glmselect data=sashelp.cars;
class origin;
model mpg_city = MSRP|wheelbase|weight|cylinders|origin @2;
run;

%put &=_GLSIND;

proc glm data=sashelp.cars;
class origin;
model mpg_city = &_GLSIND / solution CLPARM; /* or CLB for PROC REG */
run;

 

 

 

 

jacob_klimek
Obsidian | Level 7
Great thanks this is more or less what I was looking for.
cau83
Pyrite | Level 9

We use GLMSelect along with GLM in a 3 step process where we want to quickly do some budget planning across a couple hundred different transaction types:

 

1) GLMSelect is used so that SAS can take our list of potential independent variables and do some model specification for us (we use a holdout sample and the error against the holdout sample is the selection criteria). We then put these specifications in a spreadsheet by writing the listing output to a text file, and have an excel macro that parses this out to build a table with the specification for each transaction (eg, the CLASS and MODEL statements).

2) We use GLMSelect again, pulling in the spreadsheet and looping through each specification so that we get all of our holdout sample results in one place (we produce charts and some tables with the accuracy against the out-of-sample data).

3) Finally, we use GLM. We again pull in the model specification using the spreadsheet like step 2. However, now we use all of our data (up to today) rather than have a holdout sample; and, we are able to generate the confidence interval bounds along with the final forecast.

Rick_SAS
SAS Super FREQ

@cau83 : I may not fully understand the complexity of your process, but I wonder whether any of your processes could be simplified by using the STORE statement in PROC GLMSELECT? PROC PLM produces a 'Store Information' table that contains the class variables(s) and model effects:

 

data one(drop=i j);
   array x{5} x1-x5;
   do i=1 to 1000;
      classVar = mod(i,4)+1;  
      do j=1 to 5;
         x{j} = ranuni(1);
      end;   
      y     = 3*classVar+7*x2+5*x2*x5+rannor(1);
      output;
   end;
run;
proc glmselect data=one;
   class  classVar;
   model  y = classVar x1|x2|x3|x4|x5 @2 /
                  selection=stepwise(stop=aicc);
   store out=glmselectStore;  /* STORE the model */
run;

proc plm restore=glmselectStore;
show ClassLevels Parameters;
score data=one(obs=3)  /* use PROC PLM to score the model */
      out=Score;
run;
cau83
Pyrite | Level 9

Thanks Rick. I was generally aware of the existence of this kind of functionality, and perhaps it's something to explore as we prepare this year. That being said, we do not always use the GLMSELECT specifications w/o manual change-- for instance, certain parameters (like # of business days in a week, 4 or 5) should only be positive or negative and we would remove it if is wrong if it increases the holdout sample error. Using GLMSELECT as well as the listing output/excel helps to remove friction from a process that we want to be fast because it is serving as the starting point for management rather than the end point (and we spend a lot more time on the business side of things after it's done).

 

Your original answer may be an easier way to generate the list than using the listing output and excel-- do the automatic macro variables work with a BY statement? Or at least separate with a delimiter and we could match back up?

Rick_SAS
SAS Super FREQ

do the automatic macro variables work with a BY statement?

 

Yes, and there is an example in the doc.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 5144 views
  • 3 likes
  • 3 in conversation