Hello, I am trying to generate missingness in a dataset that resembles the pattern of missing in the complete sample. I have determined the number of observations that have 2 missing variables and the probability of each variable to be missing. I currently have created code to select 1 variable to be missing randomly, using probabilities of missingness using the following code: data complete_miss3; set complete_miss2; pmale=0.103; pagyrs=0.061; ped=0.620; pint=0.342; pext=0.324; pov=0.218; psel=0.230; nmiss_flag1a=rand("Table", pmale, pagyrs, ped, pint, pext, pov, psel); nmiss_flag1b=rand("Table", pmale, pagyrs, ped, pint, pext, pov, psel); run; proc freq data=complete_miss3; tables nmiss_flag1a nmiss_flag1b ; run; The issue with this, is that often nmiss_flag1a and nmiss_flag1b are both the same variable. I am wondering if there is a way to select 2 or 3 variables from the list to be missing. i.e. a multivariate Bernoulli random list where you set 2 or 3 to be missing? I am also wondering if the probabilities have to add up to 1 across all variables in a table? These probabilities of the number of observations missing 1 or 2 variables (after deleting observations with particular common patterns). After accounting for common patterns of missing, less than 1% of the sample was only missing 1 item so I have combined the remaining into missing 2 item for the simulation (i.e. the probabilities of missingness are based on the sample missing 1 or 2 variables and do not add up to 1). Do I need to scale these probabilities so they=1 or will they automatically be scaled (the resulting distributions of random numbers from this code is accurate). Thanks so much, Jillian
... View more