Good Morning,
I am trying to create a synthetic bivariate database where the the outcome is influenced by calculated correlations. Roughly 10+ variables. I am using RanNBIN, but continue to have violations. Does anyone have experience to assist me with this?
Thank you,
Spaxxs
Chapter 9 of Wicklin (2013) explains why some sets of correlations are not feasible. "Feasible" means that the combination of means and correlations that you specified are not possible under the distributions that the macro uses.
The macro you are using supports a limited number of correlation structures: compound symmetric, AR(1), and banded. It is possible that your parameters are not feasible for those structured correlation matrices, but are feasible for more general correlations. The algorithms in Wicklin (2013) use the Emrich-Piedmonte algorithm, which enables you to fit arbitrary correlation structures (but you still have to specify feasible parameters). However, RANMBIN requires only Base SAS whereas the methods in Wicklin (2013) require a SAS/IML license.
@Levi_M You will have better success getting a reply if you could provide more information.
Here's a link to a SAS Note on RANMBIN:
Sample 66969: Generate multivariate binary data with specified means and correlation matrix
The more information you can provide the better we can assist
Hello,
I had never heard about %RanMBIN.
@Rick_SAS : you know that one?
But I know about this (related) blog from @Rick_SAS :
Tips to simulate binary and categorical variables
By Rick Wicklin on The DO Loop November 2, 2020
https://blogs.sas.com/content/iml/2020/11/02/simulate-binary-and-categorical-variables.html
Cheers,
Koen
Thank you for the feed back. I have attached the 4 items that I am using for the dataset generation.
Hopefully this helps?
Chapter 9 of Wicklin (2013) explains why some sets of correlations are not feasible. "Feasible" means that the combination of means and correlations that you specified are not possible under the distributions that the macro uses.
The macro you are using supports a limited number of correlation structures: compound symmetric, AR(1), and banded. It is possible that your parameters are not feasible for those structured correlation matrices, but are feasible for more general correlations. The algorithms in Wicklin (2013) use the Emrich-Piedmonte algorithm, which enables you to fit arbitrary correlation structures (but you still have to specify feasible parameters). However, RANMBIN requires only Base SAS whereas the methods in Wicklin (2013) require a SAS/IML license.
How about this one ? By using Genetic Algorithm .
data corr;
infile cards expandtabs;
input x1-x4 ;
cards;
1 0.452159638 0.220107738 0.412390423
0.452159638 1 0.080503668 0.366678316
0.220107738 0.080503668 1 0.0022723
0.412390423 0.366678316 0.0022723 1
;
proc iml;
use corr;
read all var _num_ into corr[c=vname];
close;
start function(x) global(ncol,corr);
temp=corr(shape(x,0,ncol));
sse=ssq(temp-corr) ;
return (sse);
finish;
nobs=1000;
ncol=ncol(corr);
size=nobs#ncol;
bounds=j(2,size,0);
bounds[2,]=1 ;
id=gasetup(2,size,123456789);
call gasetobj(id,0,"function");
call gasetsel(id,10,1,.95);
call gainit(id,10000,bounds);
niter = 200 ;
do i = 1 to niter;
call garegen(id);
call gagetval(value, id);
end;
call gagetmem(mem, value, id, 1);
want=shape(mem,0,ncol);
create want from want[c=vname];
append from want;
close;
print value[l = "Min Value:(be near zero,be better)"] ;
call gaend(id);
quit;
proc corr data=want pearson;
var _numeric_;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.