Programming the statistical procedures from SAS

Issue with data set sorting generating different estimates with proc mi

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Issue with data set sorting generating different estimates with proc mi

Hello,

 

I am trying to singly-impute missing data using stochastic regression using proc mi in SAS 14.1.  Here is a sample of my script.  (Only var1 has any missingness, by sheer luck):

 

proc mi data = INPUTFILE out = OUTPUTFILE minimum = 0.00 maximum = 1.00 nimpute = 1 seed = 123456;
var VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 RACE RISK VAR11 VAR12 VAR13
VAR14 VAR15 VAR16;
class RACE RISK;
fcs nbiter = 1 reg (VAR1 = VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 RACE RISK VAR11 VAR12 VAR13
VAR14 VAR15 VAR16);
run;

 

I am using a seed so that the procedure generates the same output data file every time.  

 

I noticed that when I sort the file by race, despite not having SAS impute by race (so, just proc sort data = inputfile; by race; run; before the proc mi statement) that I get different values imputed in the output file than when it is sorted by ID.  From what I understand, the sorting of the input data set does not matter for proc mi.  

 

At first I thought maybe SAS was imputing the file separately by race, but when I added a "by race;" command to proc mi, the estimates were different again.  So, I don't think it's doing this.

 

To summarize my estimates:

1. one set of estimates when the file is sorted by ID before proc mi.

2. one set of estimates when the file is sorted by race before proc mi.

3. a third set of estimates when I add a "by race" command to proc mi.

 

How the file is sorted should not matter if I don't have a "by" command in proc mi, right?  For example, in the SAS documentation for proc mi (https://support.sas.com/documentation/onlinedoc/stat/141/mi.pdf), on page 5890, it says "note that the input data set does not need to be sorted in any order."

 

Can anyone help me understand why I'm getting different estimates, if this is consequential, and how consequential it might be?  Would the estimates somehow be less valid if the file were sorted by race instead of ID, despite not having a "by" statement within the proc mi command?  

 

Thank you for your time.


Accepted Solutions
Solution
‎02-09-2017 03:58 PM
Super User
Super User
Posts: 6,137

Re: Issue with data set sorting generating different estimates with proc mi

Doesn't the word STOCHASTIC imply randomness?  If you give it the same seed so that it randomly selects the same observation numbers but you have sorted the observations in a different order then it will use different values in the calculations.

View solution in original post


All Replies
Esteemed Advisor
Posts: 7,057

Re: Issue with data set sorting generating different estimates with proc mi

I am NOT an expert regarding proc mi, but I would think that your using cs nbiter = 1 is what is causing your differences. I'm not sure what the documentation means by "number of burn in iterations", but I have to think it is limiting the number of records used for the regression. If that is the case, then different sort orders would definitely be expected to produce different results.

 

HTH,

Art, CEO, AnalystFinder.com

 

New Contributor
Posts: 3

Re: Issue with data set sorting generating different estimates with proc mi

Hi Art,

 

Thank you so much for your feedback.  Clearly, I am not an expert on MI either.  I have traditionally used other approaches (e.g., maximum likelihood) with missing data, so MI is new to me too.  

 

That's a good point with the nbiter.  What I thought that was doing was running one single imputation, and I agree that the "number of burn in iterations" wording is a little unclear.  I don't think the nbiters command is causing the different estimates, though.  I removed that command and re-ran the proc mi, first sorting by ID and then again by race, and once again I get different estimates from both procedures.  But still, it was worth a shot, and at least something has been narrowed down!

New User
Posts: 1

Re: Issue with data set sorting generating different estimates with proc mi

i got the same issue, different data orders before imputation generate different different imputed values. Could you please keep posted about this?
Solution
‎02-09-2017 03:58 PM
Super User
Super User
Posts: 6,137

Re: Issue with data set sorting generating different estimates with proc mi

Doesn't the word STOCHASTIC imply randomness?  If you give it the same seed so that it randomly selects the same observation numbers but you have sorted the observations in a different order then it will use different values in the calculations.

New Contributor
Posts: 3

Re: Issue with data set sorting generating different estimates with proc mi

Hi Tom,


Thanks so much for your reply!  That's what I had figured it was doing (somehow imputing differently depending on the order of observations), but it is helpful to know that that is what others think is going on.  Indeed, that seems non-consequential in the long term in terms of one set of imputed values being somehow more or less valid than the other for future analyses.  In support of that, Ms and SDs of scores among imputed participants only are very close (within .01 of each other), even though scores aren't necessarily comparable for any one participant.  

 

Thank you, again, for your help!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 123 views
  • 4 likes
  • 4 in conversation