- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 03-09-2010 01:07 AM
(5205 views)
Hello ,
Could you please suggest me how do I obtain the estimate for a new observation if I add it in my model from the regression model that I have.Suppose that the regression equation is :-
Y = a+b*A+c*B+d*C ; where Y is dependent variable of interest and A,B and C are the varaibles that are in my model.Also, previously I had dropped 2 variables from the model , D and E.Now if i want to find the estimate for Y and the 95% C.I. for this estimate for a new observation with new values for A,B,C,D, and E then what should I use.
The code i used :-
/*95% Confidence Limits for the Score*/
DATA ci_new;
set sc.sales (DROP = D E);
PROC UNIVARIATE data = ci_new;
VAR Y;
RUN;
then only i get the mean of the Y and C.I. for it. But if I plug in the vlaues of A,B, and C in the model that I have then i get a value that doesnt even fall in this inteval.
My objective is to estimate the value of Y using the model that I have viz. Y = a+b*A+c*B+d*C and 95% C.I. limits for this estimate.
Kindly guide,
markc
Could you please suggest me how do I obtain the estimate for a new observation if I add it in my model from the regression model that I have.Suppose that the regression equation is :-
Y = a+b*A+c*B+d*C ; where Y is dependent variable of interest and A,B and C are the varaibles that are in my model.Also, previously I had dropped 2 variables from the model , D and E.Now if i want to find the estimate for Y and the 95% C.I. for this estimate for a new observation with new values for A,B,C,D, and E then what should I use.
The code i used :-
/*95% Confidence Limits for the Score*/
DATA ci_new;
set sc.sales (DROP = D E);
PROC UNIVARIATE data = ci_new;
VAR Y;
RUN;
then only i get the mean of the Y and C.I. for it. But if I plug in the vlaues of A,B, and C in the model that I have then i get a value that doesnt even fall in this inteval.
My objective is to estimate the value of Y using the model that I have viz. Y = a+b*A+c*B+d*C and 95% C.I. limits for this estimate.
Kindly guide,
markc
9 REPLIES 9
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Mark, you need to use PROC REG for regression.
PROC UNIVARIATE doesn't do regression.
If you had already fit the model, then the way to get predicted value and confidence interval for the new observation is simply to append the new observation to your data set, with Y missing but with A B and C non-missing at whatever value you desire, and then fit the regression again. The output data set will contain the predicted value and confidence interval for all of your data, and for the new observation.
Regarding the code you posted...even though it doesn't solve the problem, I suggest a simplification should you ever need PROC UNIVARIATE
PROC UNIVARIATE data = sc.sales(drop=D E);
VAR Y;
RUN;
PROC UNIVARIATE doesn't do regression.
If you had already fit the model, then the way to get predicted value and confidence interval for the new observation is simply to append the new observation to your data set, with Y missing but with A B and C non-missing at whatever value you desire, and then fit the regression again. The output data set will contain the predicted value and confidence interval for all of your data, and for the new observation.
Regarding the code you posted...even though it doesn't solve the problem, I suggest a simplification should you ever need PROC UNIVARIATE
PROC UNIVARIATE data = sc.sales(drop=D E);
VAR Y;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I just found this thread using the search engine and decided to warm it up, rather than start a new one.
I have quite the same question as Mark, however since my initial regression is based on a huge data set, appendnig new observations and rerun the regression each time I get some new observations is impossible simply due to time issues (A regression run takes about 4 hours).
I currently extract the regression coefficients using the outest= option and apply them to my prediction data set using Proc Score. However, I only get the predicted values like that, without individual confidence limits, which would be highly desirable.
I already thought about computing them by hand, however I don't have an idea how to compute the necessary hat values (h_i) without using PROC IML, which we haven't licensed 😞
I would very appreciate any help! Message was edited by: IljaMett
I just found this thread using the search engine and decided to warm it up, rather than start a new one.
I have quite the same question as Mark, however since my initial regression is based on a huge data set, appendnig new observations and rerun the regression each time I get some new observations is impossible simply due to time issues (A regression run takes about 4 hours).
I currently extract the regression coefficients using the outest= option and apply them to my prediction data set using Proc Score. However, I only get the predicted values like that, without individual confidence limits, which would be highly desirable.
I already thought about computing them by hand, however I don't have an idea how to compute the necessary hat values (h_i) without using PROC IML, which we haven't licensed 😞
I would very appreciate any help! Message was edited by: IljaMett
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you know the values of the independent variables that you will want predictions for prior to running the regression? If so, simply append them as observations, with the dependent variable set to missing (=.). The output data set should contain predicted values, standard errors, confidence limits, etc. for all observations in the dataset, so these can quickly be extracted.
Steve Denham
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I don't know the independent values in advance. If I would, then it wouldn't be a new data set, i guess? 🙂
Any other ideas, someone?
Any other ideas, someone?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How many independent variables are you dealing with? Are there a priori values that you will be interested in? A cartesian join of all the values generates a new data set, to be appended to the source data, with missing dependent values. If you are only interested in particular values that arise after gathering more data, then you are stuck with re-running everything.
Steve Denham
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you are using sas 9.22 (really, 9.2 with TS2M3 release version), you may have a way, if you can do the regression with PROC GLM instead of with PROC REG. There is now a post-model-fitting procedure called PLM. It is designed for performing tests (contrasts, etc.) after fitting a complex model, often with many random effects and large data sets. PLM can also be used for getting predictions of the response variable, using the SCORE statement. Here is a simple example. You can read more about it in the documentation.
Remember, you need 9.22 for this to run.
data a; *--data for fitting the model;
input x1 x2 y;
datalines;
0 5 2
1 5 4
2 4 3
3 4 6
4 3 6
5 3 5
6 3 8
7 2 10
8 1 9
9 3 9
;
data test; *--this is the new data for getting predictions post-model fitting;
input x1 x2;
datalines;
0.5 4
1 1
2 7
2 8
2 9
3 0
3 1
;
run;
proc print data=a;run;
proc glm data=a; *--fit model;
model y = x1 x2 / solution;
store sasuser.a_analysis / label='test'; *--new option with 9.22 (Item Store);
run;
proc plm source=sasuser.a_analysis; *--new procedure, using Item Store;
score data=test out=test_out predicted stderr lclm uclm lcl ucl; *--get predictions;
run;
proc print data=test_out;
run;
Remember, you need 9.22 for this to run.
data a; *--data for fitting the model;
input x1 x2 y;
datalines;
0 5 2
1 5 4
2 4 3
3 4 6
4 3 6
5 3 5
6 3 8
7 2 10
8 1 9
9 3 9
;
data test; *--this is the new data for getting predictions post-model fitting;
input x1 x2;
datalines;
0.5 4
1 1
2 7
2 8
2 9
3 0
3 1
;
run;
proc print data=a;run;
proc glm data=a; *--fit model;
model y = x1 x2 / solution;
store sasuser.a_analysis / label='test'; *--new option with 9.22 (Item Store);
run;
proc plm source=sasuser.a_analysis; *--new procedure, using Item Store;
score data=test out=test_out predicted stderr lclm uclm lcl ucl; *--get predictions;
run;
proc print data=test_out;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also, PROC PLM is designed to be used with several procedures, such as MIXED, GLIMMIX, and others, but not with REG.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ivm,
thx for your tip, that sounds reasonable. Actually, we don't have the correct SAS version here yet, but we might probably get it soon.
I'll take a look on the procedure.
thx for your tip, that sounds reasonable. Actually, we don't have the correct SAS version here yet, but we might probably get it soon.
I'll take a look on the procedure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you can do the regression in REG, you should be able to do it in GLM. With the use of the new STORE statement, you create a permanent file (the Item Store) on your hard drive that contains all the needed information for later calculations (very cool). Then, you don't have to ever run the GLM procedure again (unless you want a different model). You simply run the new PLM procedure that accesses your Item Store and new X data file, and the relevant statistics (contrasts, predictions) are calculated.
You do need 9.22 or later; hopefully you get it soon. You can read more at:
http://support.sas.com/resources/papers/proceedings10/258-2010.pdf
You do need 9.22 or later; hopefully you get it soon. You can read more at:
http://support.sas.com/resources/papers/proceedings10/258-2010.pdf