I am simulating data for a 2-groups trial with 100 subjects in each group. There are four variables: GROUP, SUBJNO, SEX, AGE. SEX and AGE are covariates for following model.
Here is my code:
%let seed=12345;
data tab1;
call streaminit(&seed.);
length subjno $4 sex $1;
do _n_=1 to 100;
group=1;
subjno='1'||put(_n_,z3.);
sex=choosec(rand('table',0.5,0.5),'M','F');
age=rand('integer',18,70);
output;
end;
do _n_=1 to 100;
group=2;
subjno='2'||put(_n_,z3.);
sex=choosec(rand('table',0.5,0.5),'M','F');
age=rand('integer',18,70);
output;
end;
run;
Unfortunately, The initial random seed, 12345, causes SEX statistical difference on different group, the p value of Chisq is 0.0477.
I have tried new seed value like 123, 1234, 123456, 1234567 and they will not cause SEX statistical difference on different group.
I know there is a possibility that statistical difference of covariates happens. Is there a way to ensure no statistical differences between groups in covariates when simulating data?
Maybe block randomization with covariates as block factor? What about continous covariate variable like AGE?
You want this ?
proc plan seed=123; factors group=2 ordered subj=100 /noprint; output out=tab1; quit; data tab2; set tab1; if subj in (1:50) then sex='F'; else sex='M'; run; proc freq data=tab2; table group*sex/ chisq; run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.