Hi there,
I am trying to generate missingness for a summative scale. This involves: (1) randomly selecting individuals to be missing a summative score and (2) deleting individual items within the scale for those identified as being missing. I am struggling with #2 as all individuals need at least 1 item to be deleted and a pre-specified number to have all items deleted (9.17% missing all items) and individuals can have 1 to 5 missing items (within a 5 item scale).
Probability of missing:
item1=0.2782
item2=0.3497
item3=0.3035
item 4= 0.3207
item 5=0.3289
All items=0.0917
*remaining probability of each item after accounting for all being missing*
item1=0.1865
item2=0.258
item3=0.2118
item 4= 0.229
item 5=0.2372
Essentially, I want to delete all items for 9.17% of the identified sample for missingness - likely based on a Bernoulli distribution as follows...
if js_Sel=. then sel_items=rand('BERNOULLI', 0.0917); else sel_items=0;
...and then, conditional on the full scale being missing (i.e. js_Sel=.) and not having all items missing (i.e. sel_items=0), using the remaining probabilities to delete the remaining individual items. However, if I do this using separate random bernoulli variables, I end up getting about 25% with no missing at all (when all identified observations need to have at least one item missing) and 10% extra with all items missing.
Is there a way to create an array of Bernoulli random variables, based on the remaining probabilities, where at least 1 column needs to be =1 and it is not possible for all 5 columns to =1?
Thanks in advance!
Jillian
Hi there,
I am unable to post sample data - my apologies for the inconvenience.
For the table option, this would only allow for one variable to be selected though, correct? Several observations have multiple observations deleted so the cumulative probabilities across items are >1. When using the table function, don't the probabilities need to =1 since only one variable is selected?
Thanks for your thoughts
I am not quite sure how to do that. Any general thoughts I would be able to test the table random function on my data (or other alternatives)?
Thanks and again my apologies, Jillian
Hi @halladje,
If I understand your requirements correctly, you want to modify one existing dataset (by setting a number of variables to missing). So, your probabilities (0.2782, 0.3497, etc.) are actually expected relative frequencies in that dataset (after the modification).
The main issue is: Most of the probabilities you've specified are marginal probabilities, but constraints such that "a pre-specified number to have all items deleted (9.17% missing all items)" or "it is not possible for all 5 columns to =1" imply that the Bernoulli random variables you're trying to simulate are statistically dependent. This means, you can't simply use RAND('bern',0.2782), RAND('bern',0.3497), etc. (or RAND('bern',0.1865), RAND('bern',0.258), etc. for that matter).
Maybe there is an additional issue: The relative frequencies would most likely differ from the specified probabilities due to random fluctuations. For example, on average, more than one out of ten selections from 1000 individuals using independent RAND('bern',0.3497) values will contain >368 individuals. Given the precision of the specified probabilities, you might not be happy with the results.
Here's an outline of how you could avoid both of these issues:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.