I have a problem regarding the PROC Severity and Weibull distribution. In the attached SAS file, I simulated a Weibull-distributed random variable with 1000 observations and I added 4 zero values to the random vector. I then estimated the parameters of the Weibull distribution using PROC Severity where it is still able to give me an estimation. However, as the PDF at x=0 is 0 and the log likelihood is not defined at x=0, SAS should not give me an estimation. Instead, it should give me an error. I test the data exported from SAS with R’s fitdistrplus package and indeed it gives me an error when trying to estimate the parameters. I want to know what is going on with PROC Severity when fitting Weibull distribution to a dataset containing zero values.
The reason I am asking this question is that I have a real dataset which I want to model with Weibull distribution and the dataset also contains zero values.
Here is my SAS code and R code.
proc iml;
call randseed(12345);
x=J(1000,1);
call randgen(x,'Weibull',1.5,1);
x=x//{0,0,0,0};
x=T(ranperm(x));
y=loc(x=0);
print (y);
call series(1:nrow(x),x);
create weibull from x;
append from x;
close weibull;
quit;
proc severity data=weibull;
loss col1;
dist weibull;
run;
#R code--------------------------------------------------------------
library(MASS)
library(fitdistrplus)
library(ggplot2)
#---------------------------------------------------------------
weibull=read.csv('weibull.csv')
fit_w=fitdist(weibull$COL1,'weibull')
ggplot(data = weibull)+geom_histogram(aes(x=COL1))
> However, as the PDF at x=0 is 0 and the log likelihood is not defined at x=0, SAS should not give me an estimation.
> Instead, it should give me an error.
You should probably contact Technical Support for a full answer. I am not an expert on PROC SEVERITY, but here are a few observations:
- The likelihood function for Weibull is defined for x=0. Only the log-likelihood is undefined when x=0.
- As stated in the PROC SEVERITY documentation, the initial values for the parameter estimates are found (for Weibull) by using a method of percentiles, so the procedure can find an initial estimate.
This makes me suspect that there is special handling for x=0, but I do not know the details. I will point out that if you change the "bad values" to be negative
x=x//{-1e-6,-1e-6,-1e-6,-1e-6};
then the SAS log reports
WARNING: For at least one observation, variable COL1 has a negative value. Ignoring such observations.
(note that those observations are dropped and the procedure continues)
Thank you Rick, my guess is also the same. I suspect SAS also drops zero values without issuing any notice if it is optimizing log likelihood
I don't think they are dropped. Observations that are dropped are not given predicted values. You can look in the OUTPUT data set to see that the predicted PDF and CDF for these values are 0. In contrast, if you use a negative value, those observations are assigned missing values for the PDF and CDF.
proc severity data=weibull;
loss col1;
dist weibull;
output out=out copyvars=(col1) functions=(cdf pdf);
run;
Another note is that the Weibull pdf is not defined if the shape parameter is strictly less than 1
I think you meant to say the PDF is not defined at x=0, which is correct.
Yes. The Weibull pdf is not defined at x=0 if the shape parameter is strictly less than 1
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.