BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hello ,

Could you please suggest me how do I obtain the estimate for a new observation if I add it in my model from the regression model that I have.Suppose that the regression equation is :-

Y = a+b*A+c*B+d*C ; where Y is dependent variable of interest and A,B and C are the varaibles that are in my model.Also, previously I had dropped 2 variables from the model , D and E.Now if i want to find the estimate for Y and the 95% C.I. for this estimate for a new observation with new values for A,B,C,D, and E then what should I use.

The code i used :-

/*95% Confidence Limits for the Score*/
DATA ci_new;
set sc.sales (DROP = D E);
PROC UNIVARIATE data = ci_new;
VAR Y;
RUN;

then only i get the mean of the Y and C.I. for it. But if I plug in the vlaues of A,B, and C in the model that I have then i get a value that doesnt even fall in this inteval.

My objective is to estimate the value of Y using the model that I have viz. Y = a+b*A+c*B+d*C and 95% C.I. limits for this estimate.

Kindly guide,
markc
9 REPLIES 9
Paige
Quartz | Level 8
Mark, you need to use PROC REG for regression.

PROC UNIVARIATE doesn't do regression.

If you had already fit the model, then the way to get predicted value and confidence interval for the new observation is simply to append the new observation to your data set, with Y missing but with A B and C non-missing at whatever value you desire, and then fit the regression again. The output data set will contain the predicted value and confidence interval for all of your data, and for the new observation.

Regarding the code you posted...even though it doesn't solve the problem, I suggest a simplification should you ever need PROC UNIVARIATE

PROC UNIVARIATE data = sc.sales(drop=D E);
VAR Y;
RUN;
IljaMett
Calcite | Level 5
Hi,

I just found this thread using the search engine and decided to warm it up, rather than start a new one.

I have quite the same question as Mark, however since my initial regression is based on a huge data set, appendnig new observations and rerun the regression each time I get some new observations is impossible simply due to time issues (A regression run takes about 4 hours).

I currently extract the regression coefficients using the outest= option and apply them to my prediction data set using Proc Score. However, I only get the predicted values like that, without individual confidence limits, which would be highly desirable.

I already thought about computing them by hand, however I don't have an idea how to compute the necessary hat values (h_i) without using PROC IML, which we haven't licensed 😞

I would very appreciate any help! Message was edited by: IljaMett
SteveDenham
Jade | Level 19
Do you know the values of the independent variables that you will want predictions for prior to running the regression? If so, simply append them as observations, with the dependent variable set to missing (=.). The output data set should contain predicted values, standard errors, confidence limits, etc. for all observations in the dataset, so these can quickly be extracted.

Steve Denham
IljaMett
Calcite | Level 5
Sorry, I don't know the independent values in advance. If I would, then it wouldn't be a new data set, i guess? 🙂

Any other ideas, someone?
SteveDenham
Jade | Level 19
How many independent variables are you dealing with? Are there a priori values that you will be interested in? A cartesian join of all the values generates a new data set, to be appended to the source data, with missing dependent values. If you are only interested in particular values that arise after gathering more data, then you are stuck with re-running everything.

Steve Denham
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
If you are using sas 9.22 (really, 9.2 with TS2M3 release version), you may have a way, if you can do the regression with PROC GLM instead of with PROC REG. There is now a post-model-fitting procedure called PLM. It is designed for performing tests (contrasts, etc.) after fitting a complex model, often with many random effects and large data sets. PLM can also be used for getting predictions of the response variable, using the SCORE statement. Here is a simple example. You can read more about it in the documentation.
Remember, you need 9.22 for this to run.
data a; *--data for fitting the model;
input x1 x2 y;
datalines;
0 5 2
1 5 4
2 4 3
3 4 6
4 3 6
5 3 5
6 3 8
7 2 10
8 1 9
9 3 9
;
data test; *--this is the new data for getting predictions post-model fitting;
input x1 x2;
datalines;
0.5 4
1 1
2 7
2 8
2 9
3 0
3 1
;
run;
proc print data=a;run;
proc glm data=a; *--fit model;
model y = x1 x2 / solution;
store sasuser.a_analysis / label='test'; *--new option with 9.22 (Item Store);
run;
proc plm source=sasuser.a_analysis; *--new procedure, using Item Store;
score data=test out=test_out predicted stderr lclm uclm lcl ucl; *--get predictions;
run;
proc print data=test_out;
run;
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
Also, PROC PLM is designed to be used with several procedures, such as MIXED, GLIMMIX, and others, but not with REG.
IljaMett
Calcite | Level 5
Hi Ivm,

thx for your tip, that sounds reasonable. Actually, we don't have the correct SAS version here yet, but we might probably get it soon.
I'll take a look on the procedure.
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
If you can do the regression in REG, you should be able to do it in GLM. With the use of the new STORE statement, you create a permanent file (the Item Store) on your hard drive that contains all the needed information for later calculations (very cool). Then, you don't have to ever run the GLM procedure again (unless you want a different model). You simply run the new PLM procedure that accesses your Item Store and new X data file, and the relevant statistics (contrasts, predictions) are calculated.
You do need 9.22 or later; hopefully you get it soon. You can read more at:
http://support.sas.com/resources/papers/proceedings10/258-2010.pdf

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 4481 views
  • 0 likes
  • 5 in conversation