Hello, I am trying to singly-impute missing data using stochastic regression using proc mi in SAS 14.1. Here is a sample of my script. (Only var1 has any missingness, by sheer luck): proc mi data = INPUTFILE out = OUTPUTFILE minimum = 0.00 maximum = 1.00 nimpute = 1 seed = 123456; var VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 RACE RISK VAR11 VAR12 VAR13 VAR14 VAR15 VAR16; class RACE RISK; fcs nbiter = 1 reg (VAR1 = VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 RACE RISK VAR11 VAR12 VAR13 VAR14 VAR15 VAR16); run; I am using a seed so that the procedure generates the same output data file every time. I noticed that when I sort the file by race, despite not having SAS impute by race (so, just proc sort data = inputfile; by race; run; before the proc mi statement) that I get different values imputed in the output file than when it is sorted by ID. From what I understand, the sorting of the input data set does not matter for proc mi. At first I thought maybe SAS was imputing the file separately by race, but when I added a "by race;" command to proc mi, the estimates were different again. So, I don't think it's doing this. To summarize my estimates: 1. one set of estimates when the file is sorted by ID before proc mi. 2. one set of estimates when the file is sorted by race before proc mi. 3. a third set of estimates when I add a "by race" command to proc mi. How the file is sorted should not matter if I don't have a "by" command in proc mi, right? For example, in the SAS documentation for proc mi (https://support.sas.com/documentation/onlinedoc/stat/141/mi.pdf), on page 5890, it says "note that the input data set does not need to be sorted in any order." Can anyone help me understand why I'm getting different estimates, if this is consequential, and how consequential it might be? Would the estimates somehow be less valid if the file were sorted by race instead of ID, despite not having a "by" statement within the proc mi command? Thank you for your time.
... View more