Hello everyone,
I am generating categorical data (X=1, X=2, X=3) with the same probability (.333,.333,.333). I want to ask if there is a way to simulate data for these three groups and ensure that the groups have the same number of observations.
Thanks,
DATA SIM;
call streaminit(331980);
totaln=&NSIM*&NOBS;
DO I=1 TO totaln;
X=RAND('Table',.333, .333.333);
D1=0; D2=0;
if X=1 THEN DO; D1=0; D2=0; end;
if X=2 THEN DO; D1=1; D2=0; end;
if X=3 THEN DO; D1=0; D2=1; end;
M = &i0m + (&a1*D1)+(&a2*D2)+(&errorm)*RAND('NORMAL');
XM=x*m;
D1M=D1*M;
D2M=D2*M;
Y= &i0y +(&cp1*D1)+(&cp2*D2)+(&b*M)+(&h1*D1M)+ (&h2*D2M)+(&errory)*RAND('NORMAL');
OUTPUT;
END;
Not with that code. The Rand function wants a comma between each parameter and you are missing comma for the last table value.
You also aren't accounting for the occasional 4 that table will give as the sum of .333+.333+.333 = .999 . So you will get some 4 with probability 0.001. I ran a million trials and have 972 4's in the result. Which is where you are getting your missing M, XM, D1M, D2M and Y values from. Was that intentional?
"Random" means that unless a process is deterministic (i.e. X=3 all the time) the distributions change.
Describe the purpose of forcing "random" group assignments to have the same number.
%let nobs=3000;
DATA SIM;
call streaminit(331980);
do n=1 to &nobs.;
x=rand('normal');output;
end;
run;
proc surveyselect data=sim out=want groups=3 seed=123;
run;
/*verify the result*/
proc freq data=want;
table GroupID;
run;
Sure, specify the number of groups and the number of observations in each, then do the simulation within each group. For example:
%let NSIM=10;
%let NGROUPS=3;
%let NOBSPERGROUP=4;
DATA SIM;
call streaminit(331980);
do SampleID = 1 to ≁
do X = 1 to &NGROUPS; /* group ID */
do ObsNum = 1 to &NOBSPERGROUP;
/* put any computations here */
D1=0; D2=0;
if X=1 THEN DO; D1=0; D2=0; end;
if X=2 THEN DO; D1=1; D2=0; end;
if X=3 THEN DO; D1=0; D2=1; end;
M = &i0m + (&a1*D1)+(&a2*D2)+(&errorm)*RAND('NORMAL');
XM=X*m;
D1M=D1*M;
D2M=D2*M;
Y= &i0y +(&cp1*D1)+(&cp2*D2)+(&b*M)+(&h1*D1M)+ (&h2*D2M)+(&errory)*RAND('NORMAL');
OUTPUT;
end;
end;
end;
run;
proc freq data=sim;
tables SampleID*X / nocum norow nocol nopercent;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.