Thank you for your reply!
@SteveDenham wrote:
So given the definition of left censoring that PROC SEVERITY uses, your response value could potentially be negative. Zero and negative values aren't supported by several of the interesting distributions available to you in SEVERITY. Would those values be meaningful, or even observable? (I only ask as I don't know what the response variable is).
In fact, PROC SEVERITY adopts a latent variable modeling paradigm for dealing with censoring data. For instance, if I am the manager of an insurance company and wish to find out issues associated with reimbursement, then I plan to build a linear regression model with the amount of reimbursement (term it "y" here) as the dependent variable and several variables (term them "x1", "x2", ... "xn" here). It is easy to find out that y is not less than 0 (non-negative), essentially violating the assumption of multiple regression where the dependent variable can take any real value.
The latent variable modeling paradigm assumes that y is in fact a partially observed variable of a latent one, which is called y* here. In the example here, the amount of reimbursement is assumed to be closely associated with y* such that if y* is ≥0, then y=y*; if y* is <0, then y=0. The unobserved variable y* is a latent variable because it is not fully observed. On the other hand, if the insurance company does have records where y=0, then the manifest variable y is called censored.
However, if the insurance company only contains records with reimbursement, namely all subjects without reimbursement are not documented in the dataset, then the record only contains cases with y>0 (no equal sign here). In this case, the manifest variable is called truncated. Of note, the latent variable modeling paradigm can be readily applied to the case of truncation. In fact, you can use the same collection of statistical tools built on the latent variable assumption to handle censoring and truncation. However, the researcher himself/herself has to be fully aware of whether his/her data is truncated or censored so as to correctly report and interpret their results, as truncation and censoring are essentially different concepts.
Before I end this thread, I would like to point out that the latent variable modeling paradigm is not the only approach for modeling censored or truncated data. Other modeling paradigms include the two-part model approach and the hurdle model approach. They are, however, not supported by PROC SEVERITY. For a concise yet comprehensive and therefore excellent review, see Two-Part Models for Zero-Modified Count and Semicontinuous Data | SpringerLink.
... View more