@halladjeYou have choosen RAND('Table',x1,x2,x3,x4,x5,x6,x7). My understanding about RAND('Table',p1,p2,...pn) is as follows. My interpretation might be wrong but I have followed the official documentation as here. You are passing a list of N probabilities (in your case 7). Based on what is the probability at each index, that index will get selected and the output will be that index number itself. Please see the documentation link shared above. In other words if x1 has higher probability than x2 then the output will be 1 more times than 2. For example, I created a demo data table with 100 rows where each row output was coming from a RAND('Table', 0.05, 0.1, 0.15, 0.22, 0.23, 0.2499) function. Here Index 1 has the least probability, index 2 has more probability, index 3 has still higher probability while index 6 has maximum probability. Also please note that they add up to very close to 1. So I expected 1,2,3,4,5 and 6 to be the output with probability coming from the function. The output is as below followed by the code used to generate the data and the output image. Output percentage of Number matches the probability table DATA DEMO;
INPUT OutputNum 2. @@;
OutputNum = RAND('Table',0.05,0.1,0.15,0.22,0.23,0.2499);
DATALINES;
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
;
RUN;
PROC TABULATE DATA=DEMO;
CLASS OutputNum;
TABLE OutputNum*PCTN;
RUN; I am sorry if you knew all this. I just used the previous paragraphs as stepping stones to answer your questions, as below. I am also wondering if the probabilities have to add up to 1 across all variables in a table? The documentation, shared above, clearly says that sum of probabilities can be different from 1 but then accordingly the integers that will be output will be different. Do I need to scale these probabilities so they=1 or will they automatically be scaled (the resulting distributions of random numbers from this code is accurate). I don't think that will give you what you want but that is only my opinion. I am wondering if there is a way to select 2 or 3 variables from the list to be missing. i.e. a multivariate Bernoulli random list where you set 2 or 3 to be missing? Since for both the missing flags (that you are trying to simulate) you are essentially using the same probabilities, it is quite natural to assume that there will be instances when the number flags will be same. Actually the sum of 0.103,0.061,0.620,0.342,0.324,0.218,0.230 is definitely more than one and according to the rules described in the documentation the output will only take one of the four values (1,2,3 and 4). If you really want to produce entirely non-overlapping numbers then ballardw has already given you a very nice advise. Please follow that. I am sorry if you knew all this already and I underestimated your problem or misread the requirement completely. I thought I should let you know about this so that you not only take advantage of the advise, already given, but also know a little more about what is going on with RAND('Table'). As I said, I am sorry if you already knew about this.
... View more