My goal has been to take the correlation matrix from an existing (empirical) multivariate dataset and use this to generate a centered and standardized (mean=0, SD=1) simulated dataset. The code I use to do so is copied below. I have been using the correlation matrix of my real data as the input to RANDNORMAL and when I do so my output dataset looks exactly as one would imagine, i.e. means around 0 and SD of 1, with same correlation structure as the original dataset. However I realize RandNormal was originally intended to accept the covariance matrix, not the correlation matrix, as its input. When I used the covariance matrix as input to randnormal I find some unexpected results - the standard deviation of my simulation now suddenly varies quite a bit, from 0.39-1.09, though my means still hover around 0 and the simulated correlation matrix is as expected. My question is why does variability in my simulated data seem to increase with the use of the covariance matrix, and how can I account for this? I am concerned that the data generated with the correlation matrix may yield unexpected linear dependencies. Here is the code I use, which I obtained both from this forum and from The Do Loop blog (http://blogs.sas.com/content/iml/😞 proc iml; call randseed(4321); /* specify population mean and covariance */ use simfin.covmat; * <------here I either use the correlation or covariance matrix. The cov matrix is poorly standardized.; read all var _num_ into Cov[c=varNames]; /* save var names */ close simfin.corrmat; Mean = j(nrow(Cov),1,0); /* zero vector */ N = 500; /* sample size */ NumSamples = 1; /* number of samples/replicates */ X = RandNormal(N*NumSamples, Mean, Cov); ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */ Z = ID || X; varNames = "ID" || varNames; /* comncatenate "ID" to var names */ create MVN from Z[c=varNames]; append from Z; close MVN; quit;
... View more