BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nsns
Obsidian | Level 7

Hello,

I am trying to extrapolate a regression line for a stability test-retesting.  I need to show at what point the 95% Confidence interval, when extended, will hit the acceptance criteria.   I have data up to 18 months and want to extrapolate the line to 24 months.  The guidelines require that the 95% confidence limit be extended to see where the limit hits the acceptance criteria.  If I include in the input data the extra timepoints (i.e. 21 and 24 months) with missing data, Proc Reg gives me the predication values for those timepoints as well.  It also gives me confidence limits for those time points.  So I actually have my answer.  What my question is - is how are these confidence intervals calculated.  


Example code looks as follows:

data a;
input time result;
cards;
0 13
3 23
6 26
9 32
12 30
15 33
18 34
21 .
24 .
;
run;

proc reg data=a;
model result=time;
output out=reg p=pred uclm=upper lclm=lower;
run;
quit;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Others have already raised questions about whether PROC REG is the appropriate tool, but the answer to your question is found in the SAS documentation for predicted values.  See the equations for LowerM and UpperM.

View solution in original post

11 REPLIES 11
Reeza
Super User

Is this an academic question or you're trying to implement this in your corporation?

Linear regression is not appropriate for time series data, assumptions of independence are violated.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_reg_details33.htm

Prediction intervals for future are a different calculation than confidence intervals on known data points.
https://stats.stackexchange.com/questions/16493/difference-between-confidence-intervals-and-predicti...


EDIT:

The technique you're using is outlined here:

https://blogs.sas.com/content/iml/2014/02/17/the-missing-value-trick-for-scoring-a-regression-model....

 

Other options for scoring:

https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html

 

nsns
Obsidian | Level 7
Dear Reeza,
Thank you for your response. I will check out the references you sent. In the meanwhile I wanted to answer your question.
I am analyzing stability data and following the FDA guidelines
Q1E Evaluation of Stability Data.
(The data I sent was example data to illustrate the question. )
Thanks.
Reeza
Super User

Drug stability over time? Doesn't that usually require survival analysis?

It's been a while since I've done a clinical trial though (a decade).

nsns
Obsidian | Level 7

 

The guidelines say specifically that "Regression analysis is considered an appropriate approach to evaluating the stability data for a quantitative attribute and establishing a retest period or shelf life."  It also says "An appropriate approach to retest period or shelf life estimation is to analyze a quantitative
attribute (e.g., assay, degradation products) by determining the earliest time at which the 95
percent confidence limit for the mean intersects the proposed acceptance criterion"

The figure in the guidelines shows data collected up to 12 months and then extrapolation to show the degradation would be within the acceptance range at 24 months (the guidelines have rules for how long you may extrapolate and also will need to update with longer term data - but this is acceptable).

 

nsns_0-1635320648657.png

So I think that I am ok with using the regression procedure. 

The output dataset that is generated using the sample data above includes the missing points.  This is what it generates:

 

time result Predicted Value Lower Bound Upper Bound
of result of 95% C.I. of 95% C.I.
  for Mean for Mean
0 13 17.9643 11.841 24.0876
3 23 21.0714 16.2679 25.8749
6 26 24.1786 20.3811 27.9761
9 32 27.2857 23.8891 30.6823
12 30 30.3929 26.5954 34.1904
15 33 33.5 28.6965 38.3035
18 34 36.6071 30.4838 42.7304
21 . 39.7143 32.1193 47.3093
24 . 42.8214 33.6758 51.967

 

 

What I want to understand is how the 95% confidence interval in the proc reg procedure is calculated for the extended period - in my example above, that is for months 21 and 24.  

 

(I am still working through your links).

 

Thanks for your help.

nsns
Obsidian | Level 7
The guidelines specifically say regression analysis. Since this is what is requested, I would rather stick with the regression. Thanks for the links that you sent. They are interesting and helpful and give insight - although I am not sure that they resolve my question. Will continue to review them. Thanks.
Ksharp
Super User
If you need "extrapolation of regression", then you need SAS/ETS .
I suggest post your question at Forecasting Forum .
Check PROC ARIMA , PROC UCM , PROC ESM ,PROC FORECAST ......
nsns
Obsidian | Level 7
It's been a very long time since I've done time series and forecasting - I will look into this however, I feel that since the FDA guidelines specifically say regression - I should work with this. Thanks for your input and your suggestion.
Rick_SAS
SAS Super FREQ

Others have already raised questions about whether PROC REG is the appropriate tool, but the answer to your question is found in the SAS documentation for predicted values.  See the equations for LowerM and UpperM.

SteveDenham
Jade | Level 19

It helps to have some experience with regulators on this.  Linear extrapolation of stability data has been done for years.  While that may not make it "right", there is an impressive track record.  The problem with shifting to any of the time series methods is that the number of time points on these studies is often too small to be able to estimate the parameters, the time points are not equally spaced, and the measurements are not on the same sample (i.e., several samples are taken at the initiation of the stability testing period, and then destructively analyzed at pre-determined time points).  Consequently, a lot of time series methods aren't robust enough to deal with this.

 

As far as survival analysis, I think you would need multiple samples to analyze at each pre-determined time point to be able to estimate the hazard ratio (failure rate).  Unless the synthesis has been scaled up to a near production level, there may not be sufficient test article to carry out a decent survival analysis.

 

So for once, I can understand the use of a not-quite-right method.

 

SteveDenham

nsns
Obsidian | Level 7
Thanks for your reply Steve. I agree with you. Additionally, I think that since the FDA has included specific conditions for the stability assessments - i.e how far out you may go etc, and that long term testing has to follow the extrapolated values (for confirmation, I believe), I think that using this method is ok. Thanks for your input.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2100 views
  • 12 likes
  • 5 in conversation