<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: simulate with given variance-covariance in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561733#M27768</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp; might give you a hand.&lt;/P&gt;
&lt;P&gt;Rick wrote a blog about it before, although you need SAS/IML module .&lt;/P&gt;</description>
    <pubDate>Mon, 27 May 2019 12:34:32 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2019-05-27T12:34:32Z</dc:date>
    <item>
      <title>simulate with given variance-covariance</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561689#M27766</link>
      <description>&lt;P&gt;Dear sasusers,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For a genome-wide association application, I would like to simulate data with a given variance-covariance structure. I want to simulate a vector with n observations. The variance is given by:&lt;/P&gt;
&lt;P&gt;var(Y) = K*sigma_p + I*sigma_e&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;K is a similarity matrix of dimension nxn and I is the unity matrix of dimension nxn. There is one fixed effect.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I don't know where to start. Can somebody give me some hints or links?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;with kind regards,&lt;/P&gt;
&lt;P&gt;Veronique&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2019 08:32:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561689#M27766</guid>
      <dc:creator>vstorme</dc:creator>
      <dc:date>2019-05-27T08:32:35Z</dc:date>
    </item>
    <item>
      <title>Re: simulate with given variance-covariance</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561719#M27767</link>
      <description>&lt;P&gt;You don't say what distribution you want to simulate with. If you are talking about a multivariate normal distribution, you can generate random observations from a multivariate normal distribution with specified covariance matrix using PROC SIMNORMAL. &lt;A href="https://blogs.sas.com/content/iml/2017/09/25/simulate-multivariate-normal-data-sas-simnormal.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2017/09/25/simulate-multivariate-normal-data-sas-simnormal.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2019 10:55:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561719#M27767</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-05-27T10:55:18Z</dc:date>
    </item>
    <item>
      <title>Re: simulate with given variance-covariance</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561733#M27768</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp; might give you a hand.&lt;/P&gt;
&lt;P&gt;Rick wrote a blog about it before, although you need SAS/IML module .&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2019 12:34:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561733#M27768</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-05-27T12:34:32Z</dc:date>
    </item>
    <item>
      <title>Re: simulate with given variance-covariance</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561972#M27780</link>
      <description>&lt;P&gt;See Section 8.3 of &lt;EM&gt;Simulating Data with SAS&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might also want to look at Section 12.3, especially p 232-235, for applications to mixed models. There is an errata for p. 233. The math is correct but the CS and R subscripts are reversed. The term \sigma^2_R on the diagonal is the residual covariance. The off-diagonal terms, which should be labeled as \sigma^2_{CS},&amp;nbsp; is the common covariance.&lt;/P&gt;</description>
      <pubDate>Tue, 28 May 2019 13:55:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/561972#M27780</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2019-05-28T13:55:21Z</dc:date>
    </item>
    <item>
      <title>Re: simulate with given variance-covariance</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/563292#M27819</link>
      <description>&lt;P&gt;Based on chapter 12.3.2 and implementing Higham's algorithm, I simulated the following:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/******************
* test simulation *
******************/

* step 1: create 5 random SNPS;
**********************************;
* in practice you retrieve 5 snps that are located nearby and you assemble them in a dataframe;
data Categories(keep=V1--V5);
  call streaminit(4321);
  array p[3] (0.4 0.2 0.4); /* probabilities */
  do i = 1 to 100;
  V1 = rand("Table", of p[*]); /* use OF operator */
	V2 = rand("Table", of p[*]); /* use OF operator */
  V3 = rand("Table", of p[*]); /* use OF operator */
  V4 = rand("Table", of p[*]); /* use OF operator */
  V5 = rand("Table", of p[*]); /* use OF operator */
  output;
  end;
run;
proc print data=Categories;
run;
data snps (keep= snp1--snp5);
  set Categories;
	snp1 = V1-1;
	snp2 = V2-1;
	snp3 = V3-1;
	snp4 = V4-1;
	snp5 = V5-1;
	keep snp1--snp
run;
proc print data=snps;
run;
proc freq data=snps;
tables snp1 snp2 snp3 snp4 snp5/ nocum nopercent;
run;

* step2: create a kinship matrix, I will use the AR1 structure to get a quick sort of similarity matrix;
*******************************************************************************************;
proc iml;
/* return p x p matrix whose (i,j)th element is rho^|i - j| */
  start AR1Corr(rho, n); 
     return rho##distance(T(1:n), T(1:n), "L1");
  finish;
  /* test on 100 x 100 matrix with rho = 0.8 */
  rho = 0.8;  
  n = 100;
  kinship = AR1Corr(rho, n);
  print kinship[format=Best7.];

	*export to dataset;
  create dataK from kinship;
  append from kinship;
  close dataK;
quit;

* simulate a trait;
*******************;
* in practice read the kinship from csv file, create matrix from dataset;
proc iml;
  * import kinship from data;
  use work.dataK;                       /* open the data set */
  read all var _ALL_ into K; 
  close work.dataK;
  print(K); 

	* implement Higham's algorithm for computing the nearest correlation matrix to a given symmetric matrix;
	* Ref: https://blogs.sas.com/content/iml/2012/11/28/computing-the-nearest-correlation-matrix.html ;

	/* Project symmetric X onto S={positive semidefinite matrices}.
   Replace any negative eigenvalues of X with zero */
   start ProjS(X);
     call eigen(D, Q, X); /* note X = Q*D*Q` */
     V = choose(D&amp;gt;0, D, 0);
     W = Q#sqrt(V`);      /* form Q*diag(V)*Q` */
     return( W*W` );      /* W*W` = Q*diag(V)*Q` */
   finish;
 
  /* project square X onto U={matrices with unit diagonal}.
   Return X with the diagonal elements replaced by ones. */
  start ProjU(X);
     n = nrow(X);
     Y = X;
     diagIdx = do(1, n*n, n+1);
     Y[diagIdx] = 1;      /* set diagonal elements to 1 */
     return ( Y );
  finish;
 
  /* Helper function: the infinity norm is the max abs value of the row sums */
  start MatInfNorm(A);
     return( max(abs(A[,+])) );
  finish;
 
  /* Given a symmetric correlation matrix, A, 
   project A onto the space of positive semidefinite matrices.
   The function uses the algorithm of Higham (2002) to return
   the matrix X that is closest to A in the Frobenius norm. */
  start NearestCorr(A);
     maxIter = 100; tol = 1e-8;  /* parameters...you can change these */
     iter = 1;      maxd   = 1;  /* initial values */ 
     Yold = A;  Xold = A;  dS = 0;
 
     do while( (iter &amp;lt;= maxIter) &amp;amp; (maxd &amp;gt; tol) );
       R = Yold - dS; /* dS is Dykstra's correction */
       X = ProjS(R);  /* project onto S={positive semidefinite} */
       dS = X - R;
       Y = ProjU(X);  /* project onto U={matrices with unit diagonal} */
 
       /* How much has X changed? (Eqn 4.1) */
       dx = MatInfNorm(X-Xold) / MatInfNorm(X);
       dy = MatInfNorm(Y-Yold) / MatInfNorm(Y);
       dxy = MatInfNorm(Y - X) / MatInfNorm(Y);
       maxd = max(dx,dy,dxy);
 
       iter = iter + 1;
       Xold = X; Yold = Y; /* update matrices */
     end;
     return( X ); /* X is positive semidefinite */
  finish;

	KK = NearestCorr(K);
	  /* for large matrices, might need to correct for numerical rounding errors */
  epsilon = 1e-10;
  KKK = ProjU( KK/(1+epsilon) );  /* divide off-diagonal elements by 1+epsilon */
  
	* import snps from data;
  use work.snps;                       /* open the data set */
  read all var _ALL_ into GT; 
  close work.snps;
  print(GT);
  
  snp1 = GT[,1]; 
	snp2 = GT[,1]; 
  snp3 = GT[,1]; 
  snp4 = GT[,1]; 
  snp5 = GT[,1]; 

	* set parms;
	n=100;                            /* 100 subjects  */
  sigma2_g = 0.1;                 /* Parameter 1: residual covariance */
  sigma2_e = 0.5;
  beta = {1, 0.5, 0.7, 3, 0.7, 0.5}; /* ratio data, value of 1 for intercept*/

	* varcov;
	G = sigma2_g*KKK ;
	R = sigma2_e*I(n);
	varcov = G + R ;

	* subjectIDs;
	treeID = colvec(1:n);
	*print(treeID);

	* design matrix;
	X = J(n,1,1)||GT ; /*col vector of 1 for intercept*/

	* simulate trait;
	call randseed(1234);
  create Mix var {"treeID" "Trait1" "snp1" "snp2" "snp3" "snp4" "snp5"};     
	          /* create dataset testsim  */

  * Note:  if you are trying to simulate random multivariate normal data, 
	you must use a positive definite matrix.;

  zero = j(1, n, 0);            /* the zero vector                  */
  eps = RandNormal(1, zero, varcov);   /* eps ~ MVN(0,R)                   */
  trait1 = X*beta + eps`;         /* fixed effects, correlated errors */

  append; 
  close;
quit;
	

* analysis;
**************;
proc sgplot data=Mix;
  title "distribution parameters";
  histogram trait1;
  density trait1;
  density trait1/type=kernel;
run;
* you clearly see a biodal distribution;

* analysis;
**********************************;
* the kinship data set must have in
 - column 1 : the treeIDs 
 - column 2 : parm (all 1's)
 - column 3 : values from 1 to n ;
data kin ;
  retain treeID parm row ;
  set datak;
	treeID = _N_ ;
	parm = 1 ;
	row = _N_ ;
run;
Proc mixed data=Mix METHOD=REML noprofile covtest itdetails ;
		class treeID  ;
		model trait1 = snp1 snp2 snp3 snp4 snp5 /solution  ddfm=KR chisq;
		random treeID / type=lin(1) ldata=work.kin ;
		parms/ lowerb= 0 0;
		ods output covParms=myParms;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I do however not have the expected results. I can't see what I did wrong.&lt;/P&gt;
&lt;P&gt;Despite the use of Higham's algorithm, I get the notes:&lt;/P&gt;
&lt;P&gt;An infinite likelihood is assumed in iteration 1 because of a nonpositive definite&lt;BR /&gt;estimated V matrix for Subject 1.&lt;BR /&gt;NOTE: Convergence criteria met.&lt;BR /&gt;NOTE: Estimated G matrix is not positive definite.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2019 15:19:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/563292#M27819</guid>
      <dc:creator>vstorme</dc:creator>
      <dc:date>2019-06-03T15:19:12Z</dc:date>
    </item>
    <item>
      <title>Re: simulate with given variance-covariance</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/563310#M27820</link>
      <description>&lt;P&gt;When you are developing code, don't begin by using the full size problem (100 dimensions). Use a smaller example.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your program:&lt;/P&gt;
&lt;P&gt;1. You don't need NearestCorr (Higham's algorithm) because the AR1Corr function returns a positive definite kinship matrix.&lt;/P&gt;
&lt;P&gt;2. I think your main problem is that you are assigning snp1 = snp2 = ... = snp5 = GT[,1].&amp;nbsp; If these are supposed to be covariates in the mixed model, they can't be equal or you have a singular design matrix. I think the ERROR is coming from this mistake. Also, shouldn't they be CLASS variables in PROC MIXED?&lt;/P&gt;
&lt;P&gt;3. You are currently generating one random variate from MVN(0, varcov. I assume this should be 100 variates:&lt;BR /&gt;eps = RandNormal(100, zero, varCov).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2019 15:51:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/563310#M27820</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2019-06-03T15:51:25Z</dc:date>
    </item>
    <item>
      <title>Re: simulate with given variance-covariance</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/563463#M27822</link>
      <description>&lt;P&gt;Thanks a lot Rick, this was really helpful, I found my mistakes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your program:&lt;/P&gt;
&lt;P&gt;1. You don't need NearestCorr (Higham's algorithm) because the AR1Corr function returns a positive definite kinship matrix.&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;STRONG&gt;Here it is not needed, but I might need it when I am using the true similarity matrix&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;2. I think your main problem is that you are assigning snp1 = snp2 = ... = snp5 = GT[,1].&amp;nbsp; If these are supposed to be covariates in the mixed model, they can't be equal or you have a singular design matrix. I think the ERROR is coming from this mistake. Also, shouldn't they be CLASS variables in PROC MIXED?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;I overlooked this, it should be snp2=GT[,2], snp3=GT[,3],...Now, I get the expected results&amp;nbsp;&lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;I assume a dosage effect of the alleles, snp is either 0 (no allele effect, snp=1 effect of 1 allele, snp=2, effect of 2 alleles), hence they are not class variables.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;3. You are currently generating one random variate from MVN(0, varcov. I assume this should be 100 variates:eps = RandNormal(100, zero, varCov).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;I think eps = RandNormal(1, zero, R) is ok. Trait has dimension 100x1, X(100x6), beta(6x1), and thus eps(100x1). This is similar to the example 12.3.2.2. in your book on simulation.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2019 08:20:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/simulate-with-given-variance-covariance/m-p/563463#M27822</guid>
      <dc:creator>vstorme</dc:creator>
      <dc:date>2019-06-04T08:20:27Z</dc:date>
    </item>
  </channel>
</rss>

