First of all, thank you for your compliments. It is a pleasure discussing statistical issues with you. What is more, I happened to log in my SAS account tonight to download appendical materials for a SAS book I was reading. Surprisingly, I found that the majority of the number of times of the our discussions reached a staggering level of some 2000, outnumbering virtually almost any other appearances on this Community. It seems the topics we have been discussing is of particular interest to a great number of people, which further motivates me to go on the discussions.
However, I would like to say a big "sorry" for responding to your latest questions so late. I was very, very, very busy last October, when our discussions took place. Now I have finished the projects I had been working on, so I have time to discuss the issues you mentioned in depth with you.
@JanetXu wrote:
Now only for the p-value. So, in order to get a pooled p-value after imputation, pooling 'z' is the correct way. In the proc mianalysis, the estimate of my pooled 'z' is just the simple mean (Robin's rule is this, right), which should follow normal, by central limit theorem, correct? then my question will be: 'what quantity' follows the non-central t? It seem there is a contradict.
I am afraid that you had not been quite familiar with imputation by the time you posted your replies. First of all, the paragraph consisted of several small mistakes. It is "Rubin's rule" instead of "Robin's rule" that was applied. This rule was named after Donald Rubin, who made remarkable contributions to the field of multiple imputation. In addition, the module handling the pooling process in SAS is called "PROC MIANALYZE".
Second, let me explain the pooling process further. Rubin's rule refers to pooling the variables of interest after calculating their values separately on each and every of the imputed datasets. Let m denote the number of imputed datasets (i.e., the number of times you impute), Q denote the variable you are interested in knowing its true value in the population but is complicated by missing values and Q1, Q2, ..., Qm denote the values of that varaible you calculate from each imputed dataset. To apply Rubin's rules, Q1, Q2, ..., Qm should follow t distributions (please note that it is not non-central t distributions) or have asymptotic normality. If that condition holds (i.e. is satisified), then you estimate Q by U, which equals (Q1+Q2+...+Qm)/m. U, on the other hand, follows a t distribution. Hypotheses testing on Q can be substituted by those on U, generating P-values, which is something that you want in the first place. Now that U follows a t distribution, the P-values are calculated by referencing the value of U against a t distribution.
Let us return to your specific question now. The central limit theorem concerns the distribution of means, which has little to do with U. You can of course argue that U is in fact a particular type of mean, so it should follow the central limit theorem as well. That is true, but in both theory and practice, we should reference U against t instead of normal distributions.
By the time you read this line, you can skim through the passages I have wrote and try to find out a place where non-central t distributions appears. There is nowhere, right? So in the entire framework that a practitioner should master for handling missing data, no variable follows a non-central t distribution.
@JanetXu wrote:
Maybe, the answer is, following normal is approximate, following t is precise ?
No, not true. When we apply Rubin's rules, we always reference U against t distributions.
@JanetXu wrote:
And for p-value, what is the p-value from SAS output for, for the 'some qunatity' follows the non-central t, correct?
Since my estimate for my pooled z follows normal, why should I report that p-value?
The answer to the first question is "not true" again. Please refer to my elaboration on Rubin's rules for details.
As for the second question, the pooled z's (which I denoted as U) follow a t distribution.
... View more