## Simulating categorical data, same observations per group.

Hello everyone,

I am generating categorical data (X=1, X=2, X=3) with the same probability (.333,.333,.333). I want to ask if there is a way to simulate data for these three groups and ensure that the groups have the same number of observations.

Thanks,

DATA SIM;
call streaminit(331980);
totaln=&NSIM*&NOBS;
DO I=1 TO totaln;
X=RAND('Table',.333, .333.333);
D1=0; D2=0;
if X=1 THEN DO; D1=0; D2=0; end;
if X=2 THEN DO; D1=1; D2=0; end;
if X=3 THEN DO; D1=0; D2=1; end;
M = &i0m + (&a1*D1)+(&a2*D2)+(&errorm)*RAND('NORMAL');
XM=x*m;
D1M=D1*M;
D2M=D2*M;
Y= &i0y +(&cp1*D1)+(&cp2*D2)+(&b*M)+(&h1*D1M)+ (&h2*D2M)+(&errory)*RAND('NORMAL');
OUTPUT;
END;

3 REPLIES 3

## Re: Simulating categorical data, same observations per group.

Not with that code. The Rand function wants a comma between each parameter and you are missing comma for the last table value.

You also aren't accounting for the occasional 4 that table will give as the sum of .333+.333+.333 = .999 . So you will get some 4 with probability 0.001. I ran a million trials and have 972 4's in the result. Which is where you are getting your missing M, XM, D1M, D2M and Y  values from. Was that intentional?

"Random" means that unless a process is deterministic (i.e. X=3 all the time) the distributions change.

Describe the purpose of forcing "random" group assignments to have the same number.  Ksharp
Super User

## Re: Simulating categorical data, same observations per group.

``````%let nobs=3000;

DATA SIM;
call streaminit(331980);
do n=1 to &nobs.;
x=rand('normal');output;
end;
run;

proc surveyselect data=sim out=want groups=3 seed=123;
run;

/*verify the result*/
proc freq data=want;
table GroupID;
run;``````

## Re: Simulating categorical data, same observations per group.

Sure, specify the number of groups and the number of observations in each, then do the simulation within each group. For example:

``````%let NSIM=10;
%let NGROUPS=3;
%let NOBSPERGROUP=4;

DATA SIM;
call streaminit(331980);
do SampleID = 1 to &NSIM;
do X = 1 to &NGROUPS;    /* group ID */
do ObsNum = 1 to &NOBSPERGROUP;
/* put any computations here */
D1=0; D2=0;
if X=1 THEN DO; D1=0; D2=0; end;
if X=2 THEN DO; D1=1; D2=0; end;
if X=3 THEN DO; D1=0; D2=1; end;
M = &i0m + (&a1*D1)+(&a2*D2)+(&errorm)*RAND('NORMAL');
XM=X*m;
D1M=D1*M;
D2M=D2*M;
Y= &i0y +(&cp1*D1)+(&cp2*D2)+(&b*M)+(&h1*D1M)+ (&h2*D2M)+(&errory)*RAND('NORMAL');
OUTPUT;
end;
end;
end;
run;

proc freq data=sim;
tables SampleID*X / nocum norow nocol nopercent;
run;``````
Discussion stats
• 3 replies
• 171 views
• 0 likes
• 4 in conversation