turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Simulation

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-25-2016 12:04 PM

I am trying to do a simulation but i am not sure how to do the code.

I have a linear equation

y=c+0.05X1+1.28X2

so here i am trying to obtain the values of y and b2 when I fix b1 first at 2 then 0

How do i do that?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-25-2016 05:23 PM

Hello @kudamakoni100,

I like mathematical questions, but I'm a bit unsure what you mean by b1 and b2. Do these refer to the coefficients 0.05 and 1.28 in your equation? Or did you just mistype X1 and X2 (cf. your post of 14 Feb)? And what can we assume about the ranges (or distributions?) of X1 and X2 and about c?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-25-2016 09:24 PM

Yes they refer to coefficients its a regression analysis

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-26-2016 01:29 PM - edited 02-26-2016 01:31 PM

Good to know it's a regression analysis. So, we have to simulate n data points (x1, x2, y), say, n=50.

Let's assume that x1 can vary in a range (x1min, x1max) and x2 in a range (x2min, x2max). You can define these ranges as appropriate. As you didn't make further assumptions about x1 and x2, it should be fine to simulate them as independent random samples from uniform distributions.

You specified b1=2 for the first simulation and b1=0 for the second and in both cases b2=1.28. For simulating the y values, however, we need the remaining parameter, c, as well. In the code below it is randomly selected from a uniform distribution on [-5, 5]. You can easily adapt this range to your needs. Moreover, we need to specify the standard deviation s>0 of the independent N(0,s)-distributed error terms (in the example: s=1), without which all data points would lie on the plane determined by the parameters, so that regression analysis would be pointless.

```
/* Simulation of linear regression model */
data simdata;
call streaminit(2718); /* initialize random number generator */
b1=2; * b1=0; /* your choices for b1 */
b2=1.28; /* set b2 */
c=10*rand('uniform')-5; /* set c arbitrarily */
n=50; /* number of observations */
x1min=-5; x1max=10; /* range for x1 */
x2min= 3; x2max=8; /* range for x2 */
s=1; /* err. std. dev. */
put (b1--s)(=); /* write parameters to the log */
do i=1 to n;
x1=x1min+(x1max-x1min)*rand('uniform');
x2=x2min+(x2max-x2min)*rand('uniform');
y=c+b1*x1+b2*x2+rand('normal',0,s);
output; /* write simulated data to SIMDATA */
end;
keep y x1 x2;
run;
```

Alternatively, one could store the model parameters in a separate dataset and create only a view of the simulated data: please see this recent post from Super User PGStats for an example.

To perform linear regression on the simulated data (which gives you the desired estimate of b2) and to obtain a prediction of y at a point (x1, x2) of your choice (in the example below I chose (4, 7)), you can use the following code (please omit the RESTRICT statement if you didn't mean to fix parameter value b1 in advance):

```
proc reg data=simdata outest=est;
model y=x1 x2;
restrict x1=2; * x1=0; /* your choices for b1 */
run;
quit;
data _null_;
x1=4; /* the x1 and x2 values for which */
x2=7; /* you want to predict y */
set est(rename=(x1=b1 x2=b2 intercept=c));
y=c+b1*x1+b2*x2;
put 'Estimate of b2 = ' b2;
put 'Prediction at x1=' x1 'and x2=' x2 +(-1) ': y=' y;
run;
```

Please feel free to insert a calculation such as y=c+b1*4+b2*7; into the simulation data step (before the line s=1;) so as to obtain the value of y for the true model parameters c, b1 and b2. You can then compare this value to the predicted value.

As you can see, the above code uses b1=2. The case b1=0 is just commented out.