Solved: Estimating Confidence Interval of predicted point from a lognormal dis...

jitb · Posted 09-13-2020 05:18 PM

I am trying to build a confidence interval of a predicted point in a time series that is lognormally distributed. I am predicting the value of just one future time point. My process has been to randomly simulate 1,000 distributions that have the same lognormal parameters (theta, sigma) as the original time series. The simulated distributions have one extra time point, i.e. the point value I am trying to predict. I choose the simulated distribution that has the least weighted average difference from the original series. Once I choose the best simulated distribution, I use the extra time point value as my prediction. Next, I would like to build a confidence interval around this predicted point. Would I be able to do this as

Point Value +- 1.96*sigma/sqrt(n)....at 95% CI?

Any help on this would be much appreciated. Thanks.

PaigeMiller · Posted 09-13-2020 07:00 PM

@jitb wrote:

Would I be able to do this as

Point Value +- 1.96*sigma/sqrt(n)....at 95% CI?

I don't think so, not if you want to follow a lognormal distribution, as ±1.96 is likely not a meaningful quantity of a lognormal distribution. Whatever distribution you select from the process you describe, you find the spot on the distribution where 2.5% of the distribution is to the left, and the spot on the distribution where 2.5% of the distribution is on the right. These define the interval you want.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 09-13-2020 07:00 PM

@jitb wrote:

Would I be able to do this as

Point Value +- 1.96*sigma/sqrt(n)....at 95% CI?

I don't think so, not if you want to follow a lognormal distribution, as ±1.96 is likely not a meaningful quantity of a lognormal distribution. Whatever distribution you select from the process you describe, you find the spot on the distribution where 2.5% of the distribution is to the left, and the spot on the distribution where 2.5% of the distribution is on the right. These define the interval you want.

--
Paige Miller

jitb · Posted 09-13-2020 08:45 PM

Thank you, Paige, for your response. You mean, take 2.5 and 97.5 percentiles of the distribution as the bounds? That makes sense. A further query, would you think taking the mean of the predicted point from the top 100 simulated distributions (based on my weighted score) would give me a better estimate? I guess that's why I was thinking of 1.96 from the central limit theorem. This is a different question, I know.

PaigeMiller · Posted 09-14-2020 06:37 AM

Yes, if you are going to average 100 points from the top 100 distributions, then I would think the Central Limit Theorem would apply, but you still ought to see how similar these points are via plotting the points and the distributions (for example, if somehow these points wind up to be bimodal, unlikely if they are all lognormal, and also if there is an extreme outlier or two, but you never know, then maybe the Central Limit theorem doesn't get you there).

--
Paige Miller

jitb · Posted 09-14-2020 10:01 AM

Hi Paige,

Yes...I need to plot the points. I will take your suggestion and use the 2.5 and 97.5 percentiles to construct the CI. Thanks so much for your advice on this!

SteveDenham · Posted 09-14-2020 08:05 AM

You have what you need for a bootstrap estimate of the mean and confidence interval. I wouldn't choose any "best' simulation as that is going to be strictly a function of the random values used to generate your time series. Instead, your best predictor is simply the mean of the new point across the 1000 simulations, and the confidence bounds would be as @PaigeMiller pointed out - the 2.5th percentile and the 97.5th percentile. You can get all of these with one call to PROC MEANS.

SteveDenham

jitb · Posted 09-14-2020 10:07 AM

Hi Steve,

Yes, I think I will look at the mean and median of the 1,000 observations. I couldn't find a way of getting the 2.5 and 97.5 percentiles from Proc Means, but was able to get them from Proc Univariate with the pctlpts option in the output statement. Thanks so much for your insights on this.

Ksharp · Posted 09-14-2020 08:24 AM

I don't think so. 1.96 is for Normal distribution, NOT for lognormal .
Calling
@Rick_SAS

jitb · Posted 09-14-2020 10:09 AM

Yes...I am discarding the 1.96 for this. Thanks.

SteveDenham · Posted 09-14-2020 01:39 PM

1.96 is fine for a large lognormal population, so long as you are doing calculations in the log space. Confidence bounds on the original scale could be obtained by exponentiating those obtained using the 1.96 factor on the log space bounds. This is because the variance of the lognormal distribution is not assumed to be a function of the mean, so the logs of the values are assumed to follow a Gaussian distribution. For analysis of variance purposes, this means that the residuals in the log space are normally distributed.

SteveDenham

jitb · Posted 09-15-2020 08:46 AM

Yes....thanks for pointing that out, Steve. I get that. My concern is that about 80% of the variable values in the original series are between 1 and 5. The remaining 20% range from 6 to 33. If I take the mean of the 1000 simulated distributions, it will, in most cases, lie between 2 and 3. The confidence interval will be very wide, e.g. between 1 and 13. I'm thinking about how to handle predicting these outliers. Maybe a mixed distribution? I've never done a mixed distribution before. But, thanks for your insights. Much appreciated.

SteveDenham · Posted 09-15-2020 09:57 AM

@jitb - it might be a mixture, in which case the bootstrap confidence bound is more likely to provide proper coverage. However, on the log scale, your values range from 0 to about 3.5 with a probable peak around 1. So a mean on the original scale of 2.7 or thereabouts makes sense.

However, I sense something interesting here. It looks like your raw values are bounded away from 0. Have you considered a gamma distribution for the values? It has a closed form mean and variance (small bias involved compared to ML estimators). And there is a compound gamma distribution (also with closed form estimators) that is essentially a mixture of two gamma distributions having the same mean but differing variance. PROC FMM on the 1000 simulated values with 2 gamma components compared to a single component by AIC sounds like a good approach.

In any case, that bootstrap mean is still likely to be your best estimator of central tendency.. Using the least weighted average difference should approximate the median, so you could check the 50th percentile value of the bootstrap sample against it. I worry that the "best" may be way out toward one or the other tails.

jitb · Posted 09-16-2020 09:55 PM

An interesting suggestion. I will try the compound gamma distribution. Proc Severity is indicating a Burr distribution for the tails. If I do a mixed distribution, would I have all the tails after a certain time period? That would not mimic my original time series well. Thanks, Steve!

Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Re: Estimating Confidence Interval of predicted point from a lognormal distribution

Catch up on SAS Innovate 2026