i try to do codes of " phase one analysis of multivariate control charts with missing data "
i do first and second steps
but i stop in the third step , as i should remove k% from data by (general pattern and MCAR)
BUT i can't do it
i want to know , how to make k% missing values from simulated data ?
Hello @GehadElsayed123,
Your question requires more details before experts can help. Can you revise your question to include more information?
Review this checklist:
To edit your original message, select the "blue gear" icon at the top of the message and select Edit Message. From there you can adjust the title and add more details to the body of the message. Or, simply reply to this message with any additional information you can supply.
SAS experts are eager to help -- help them by providing as much detail as you can.
This prewritten response was triggered for you by fellow SAS Support Communities member @Reeza
.proc iml;
call randseed(1);
N = 100; /* number of time points */
t = 1:N;
X = j(1, N); /* allocate vector */
call randgen(X, "Normal"); /* fill with random normal variates */
/* approximately 20% missing completely at random */
missIdx = sample( t, 0.2*N ); /* sample with replacement */
X[missIdx] = .;
call scatter(t, X) other="refline 0/axis=y;";
/* or sample witout replacement to get exactly 20% missing */
missIdx = sample( t, 0.2*N, "NoReplace" );
X[missIdx] = .;
this code in case of the uni-variate data but i need the code in the case of multivariate data
Use the same program, where N = rows*columns is the total number of elements in the matrix. For example, if you want a 5x20 matrix use the previous code with N=100 and then reshape the vector into a matrix by using the SHAPE function:
Y = shape(X, 5, 20); /* convert to matrix */
If you prefer to rewrite the whole program in terms of rows and columns, you can do that, too:
proc iml;
call randseed(1);
N = 100; /* number of time points (columns) */
p = 5; /* number of rows */
X = j(p, N); /* allocate matrix */
call randgen(X, "Normal"); /* fill with random normal variates */
/* approximately 20% missing completely at random */
t = 1:(N*p); /* vector that contains all indices */
missIdx = sample( t, 0.2*N*p ); /* sample with replacement */
X[missIdx] = .;
If you plan on doing a lot of simulations, I recommend the book Simulating Data with SAS.
i do it but i get error
1. Do you want uncorrelated or correlated observations? It looks like you are trying to use the RANDNORMAL function. If so, see the program below for the correct syntax.
2. The second argument to the SAMPLE function is the sample size, which is the number of indices in the range 1:n*p. This value must be greater than zero.
proc iml;
mean={0 0};
cov={1 0,
0 1};
p = ncol(mean); /* number of variables */
m=20; /* number of observations */
X=randnormal(m,mean,cov); /* multivariate normal */
t= 1:(m*p);
/* 2nd argument is not a probability, it is the number of times to
sample from t. Therefore the second argument must be an integer > 0 */
missIdx= sample(t, round(0.1*m*p) );
X[missIdx]=.;
If you want to allow the possibility that no elements are set to missing, you can use a different method for generating the missing values. In the following statements, an nxp random matrix is generated where each cell has probability=0.01 of being 1.
/* Different approach: Each cell has 0.01 probability of being missing. */
B = j(m, p);
call randgen(B, "Bernoulli", 0.01);
missIdx=loc(B=1);
if ncol(idx)>0 then X[missIdx]=.;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.