dear sas community, currently i am seeking to show the adverse effects of cluster sampling when the data in respective clusters is correlated. for this i want to generate a population with 500 clusters (let it be city disticts) were each cluster has 1000 correlated observations (let it be income of inhabitant). From this population i want to draw samples (simple random, cluster sample) in a second step and compare their characteristics. i am struggeling to create correlated data in the clusters. so far i played around with the rand function to create clusters and data points (code attached). any help on how to pre-define rho within the clusters would be very much appreciated. hope this question is not to basic. thanks in advance! /* Step 1: Generate a data set that contains 500 clusters with each having 1000 inhabitants */ %let N = 1000; /* sample size */ %let NumSamples = 500; /* number of samples */ data LOR; call streaminit(123); do SampleID = 1 to &NumSamples; /* ID variable for each LOR */ do IND = 1 to &N; tetha = 1000+SampleID*10; /* Average Income */ Lampda = 100; /* Std. Dev in Cluster */ INCOME_SPE = rand("Normal",tetha,lampda); output; end; end; run;
... View more