02-25-2016 12:04 PM
I am trying to do a simulation but i am not sure how to do the code.
I have a linear equation
so here i am trying to obtain the values of y and b2 when I fix b1 first at 2 then 0
How do i do that?
02-25-2016 05:23 PM
I like mathematical questions, but I'm a bit unsure what you mean by b1 and b2. Do these refer to the coefficients 0.05 and 1.28 in your equation? Or did you just mistype X1 and X2 (cf. your post of 14 Feb)? And what can we assume about the ranges (or distributions?) of X1 and X2 and about c?
02-26-2016 01:29 PM - edited 02-26-2016 01:31 PM
Good to know it's a regression analysis. So, we have to simulate n data points (x1, x2, y), say, n=50.
Let's assume that x1 can vary in a range (x1min, x1max) and x2 in a range (x2min, x2max). You can define these ranges as appropriate. As you didn't make further assumptions about x1 and x2, it should be fine to simulate them as independent random samples from uniform distributions.
You specified b1=2 for the first simulation and b1=0 for the second and in both cases b2=1.28. For simulating the y values, however, we need the remaining parameter, c, as well. In the code below it is randomly selected from a uniform distribution on [-5, 5]. You can easily adapt this range to your needs. Moreover, we need to specify the standard deviation s>0 of the independent N(0,s)-distributed error terms (in the example: s=1), without which all data points would lie on the plane determined by the parameters, so that regression analysis would be pointless.
/* Simulation of linear regression model */ data simdata; call streaminit(2718); /* initialize random number generator */ b1=2; * b1=0; /* your choices for b1 */ b2=1.28; /* set b2 */ c=10*rand('uniform')-5; /* set c arbitrarily */ n=50; /* number of observations */ x1min=-5; x1max=10; /* range for x1 */ x2min= 3; x2max=8; /* range for x2 */ s=1; /* err. std. dev. */ put (b1--s)(=); /* write parameters to the log */ do i=1 to n; x1=x1min+(x1max-x1min)*rand('uniform'); x2=x2min+(x2max-x2min)*rand('uniform'); y=c+b1*x1+b2*x2+rand('normal',0,s); output; /* write simulated data to SIMDATA */ end; keep y x1 x2; run;
Alternatively, one could store the model parameters in a separate dataset and create only a view of the simulated data: please see this recent post from Super User PGStats for an example.
To perform linear regression on the simulated data (which gives you the desired estimate of b2) and to obtain a prediction of y at a point (x1, x2) of your choice (in the example below I chose (4, 7)), you can use the following code (please omit the RESTRICT statement if you didn't mean to fix parameter value b1 in advance):
proc reg data=simdata outest=est; model y=x1 x2; restrict x1=2; * x1=0; /* your choices for b1 */ run; quit; data _null_; x1=4; /* the x1 and x2 values for which */ x2=7; /* you want to predict y */ set est(rename=(x1=b1 x2=b2 intercept=c)); y=c+b1*x1+b2*x2; put 'Estimate of b2 = ' b2; put 'Prediction at x1=' x1 'and x2=' x2 +(-1) ': y=' y; run;
Please feel free to insert a calculation such as y=c+b1*4+b2*7; into the simulation data step (before the line s=1;) so as to obtain the value of y for the true model parameters c, b1 and b2. You can then compare this value to the predicted value.
As you can see, the above code uses b1=2. The case b1=0 is just commented out.