I have some data in discrete choice format, and I have a question on using PROC MI for discrete choice data. I am using imputation to fill in some missing independent variables as well as the dependent variable, choice. The problem is that pairs of choices need to be mutually exclusive (one row of data is the chosen option, the other is the option not chosen), something that PROC MI does not take into account. Is there a way to enforce this condition? Should I be using a different method of imputing the data? Should I not bother imputing the dependent variable? Any advice is appreciated.
I will answer my own question if it is permitted. It may not be the most elegant solution.
To recap, my data looked like
pid | choice | var | ... |
---|---|---|---|
1 | 1 | 1 | ... |
1 | 0 | 1 | ... |
2 | 1 | 0 | ... |
2 | 0 | 0 | ... |
3 | 0 | 1 | ... |
3 | 1 | 0 | ... |
... | ... | ... | ... |
To successfully use PROC MI, I "rolled" the observations with the same pid into a single row, so that my input would look something like this
pid | choice | var1 | var2 | ... |
---|---|---|---|---|
1 | 1 | 1 | 1 | ... |
2 | 1 | 0 | 0 | ... |
3 | 2 | 1 | 0 | ... |
... | ... | ... | ... | ... |
for the same data as the first table. After PROC MI, I would split the rows into the format needed to use PROC MDC.
It turns out this was pretty trivial, I was just in a state of tunnel vision!
I will answer my own question if it is permitted. It may not be the most elegant solution.
To recap, my data looked like
pid | choice | var | ... |
---|---|---|---|
1 | 1 | 1 | ... |
1 | 0 | 1 | ... |
2 | 1 | 0 | ... |
2 | 0 | 0 | ... |
3 | 0 | 1 | ... |
3 | 1 | 0 | ... |
... | ... | ... | ... |
To successfully use PROC MI, I "rolled" the observations with the same pid into a single row, so that my input would look something like this
pid | choice | var1 | var2 | ... |
---|---|---|---|---|
1 | 1 | 1 | 1 | ... |
2 | 1 | 0 | 0 | ... |
3 | 2 | 1 | 0 | ... |
... | ... | ... | ... | ... |
for the same data as the first table. After PROC MI, I would split the rows into the format needed to use PROC MDC.
It turns out this was pretty trivial, I was just in a state of tunnel vision!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.