Store AIC values from a Monte Carlo Simulation

SasStatistics · Posted 01-07-2022 11:21 AM

I have the following simulated data which I create:

%macro monteCarloSimulation();

	%let covariates=300; /* Number of covariates (independent variables) */

	%do mcno=1 %to 100;   /* Number of simulated datasets = 100 */
		data logit_data;
		drop i j;
		array x{&covariates.} x1-x&covariates.;
		do i=1 to 1000;
		do j=1 to &covariates.;
		x{j}=ranuni(1);
		end;
		linpred=2+10*x17-8*x5+3*x2+7*x6-5*x3-12*x30+11*x130-12*x200+rand("NORMal");
		prob = exp(linpred)/ (1 + exp(linpred));
		y = (prob > 0.5);
		output;
		end;
		drop prob linpred;
		run;

		/* Here I would like to run stepwise forward regression
		and stepwise backward regression and store the corresponding AIC 
		values to produce the table referenced below. 
		This should be done for each table that i produce in the simulation 
	        Note that 100 simulated tables are produced    */


	%end;

%mend monteCarloSimulation;

%monteCarloSimulation()

From that simulated data, I would like for each simulated dataset to calculate:
- AIC from a stepwise forward regression.
- AIC from a stepwise backward regression.

- If possible (I will read up on this later) AIC from a Lasso regression.

And then finally store the AIC values in a table of the format:

	AIC_Forward_Stepwise_Regression	AIC_Backward_Stepwise_Regression	AIC_Lasso
Simulation1
SImulation2
.
.
.
Simulation100

Ideally, I would also like to finally produce some summary statistics for evaluating which model-selection scheme performs best:

	Forward_Stepwise_Regression	Backward_Stepwise_Regression	Lasso
Mean AIC
STD
Median AIC
25% quantile
75% quantile

This would be easily done in other programming languages and I guess so in SAS aswell, but are not used to doing statistical analysis in SAS (yet).

All help appreciated.

PaigeMiller · Posted 01-07-2022 11:28 AM

It's not clear to me what part of this process you are struggling with. Is it running regressions where you have the problem, or storing the AIC values, or creating the final table, or something else?

--
Paige Miller

SasStatistics · Posted 01-07-2022 11:30 AM

1. Running regression.
2. Store the AIC values.
3. Creating the final table.

I am very unused to this in SAS.

PaigeMiller · Posted 01-07-2022 11:32 AM

Step 1 in any macro writing process is to write working code with no macros and no macro variables, for one iteration. That's where you start. Show us that code that does stepwise regression on one iteration.

--
Paige Miller

PaigeMiller · Posted 01-07-2022 12:46 PM

In addition to my above comments, @Rick_SAS has written blogs about performing thousands of regressions, and no macros are needed. It's highly likely that this could be adapted to your Monte Carlo case (and again no macros needed). Or maybe even he has created a similar blog post for Monte Carlo simulations, but I'm sure there is no need for macros here.

Taking a further step back: I understand that the primary reason people run Monte Carlo simulations is to obtain estimates of variability for estimators that don't have a closed form formula for the variability of the estimator. In your case, you seem to be doing a Monte Carlo simulation for situations where you have 300 covariates which are uncorrelated with each other. This corresponds to exactly zero real-world data sets — you will never find a real-word data set where the covariates are uncorrelated (or even slightly correlated). Every real world data set I know of has certain correlations that are not close to zero, and some that are close to (or exactly equal to) ±1. So I question the value of such a Monte Carlo study; a more valuable study would be the case where the covariates have many correlations that are not near zero and possibly some that are near ±1. So my advice is to not do this particular Monte Carlo study as you have it set up, unless it is a homework assignment.

--
Paige Miller

Rick_SAS · Posted 01-07-2022 01:21 PM

The basic outline for this kind of simulation follows:

1. If you know how to use the DATA step to simulate one sample of size N from a logistic model, then put a DO loop around the outside so that you generate B samples, each of size N.

For an example of a linear model, see "Simulate many samples from a linear regression model." For a logistic model, see the ideas in this post, although the actual simulation in that post uses PROC IML.

2. Turn off ODS and use a BY-group analysis to analyze all B samples by using one call to a procedure.

3. Use PROC MEANS or UNIVARIATE to analyze the distribution of the statistic (such as AIC) that you are studying.

I would like to point out that your simulation from a logistic model is not correct. You put the "randomness" in the wrong location. Instead of

linpred = <linear combination> + rand("NORMal");
prob = exp(linpred)/ (1 + exp(linpred));
y = (prob > 0.5);

the correct formula is

linpred = <linear combination>;     /* 2. linear model */
mu = logistic(eta);                 /* 3. transform by inverse logit */
y = rand("Bernoulli", mu);          /* 4. Simulate binary response */

Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

Re: Store AIC values from a Monte Carlo Simulation

SAS Innovate 2025: Register Now