BookmarkSubscribeRSS Feed
Amanda_Lemon
Quartz | Level 8

Hi everyone, I am learning how to do simulations in SAS for different methods of analyzing a two-group pretest-posttest design.

 

I simulated ANCOVA successfully (I think) -- see the code below (this code uses a truncated normal distribution). But now I want to simulate gain scores (y-x), and I am confused... In ANCOVA, I just simulated y under particular conditions, and then tested the model through proc glm.

 

Data MC (keep = x y g sampleID);
Fa = cdf('Normal', -6, 0, 1);
Fb = cdf('Normal', 6, 0, 1); 
call streaminit(12345678);
do sampleID = 1 to 1000;
 do g = 0, 1;
  do i = 1 to 500;
    v1 = Fa + (Fb-Fa)*rand('Uniform');
    v2 = Fa + (Fb-Fa)*rand('Uniform');
    x = 0*g + quantile('Normal', v1, 0, 1); /* no effect of group on x */
    y = 1*x + 0*g + quantile('Normal', v2, 0, 1); /* perfect stability between x and y; no effect of g on y */ 
    output;
  end;
 end; 
end;
run;
proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;

 But with gain scores this will not work because on the one hand gain = x-y but on the other hand I need to simulate gain to specify no effect of the group (see the attempt below...). But that's not right as I can't have gain to equal to two different expressions...

Data MC (keep = x y g sampleID);
Fa = cdf('Normal', -6, 0, 1);
Fb = cdf('Normal', 6, 0, 1); 
call streaminit(12345678);
do sampleID = 1 to 1000;
 do g = 0, 1;
  do i = 1 to 500;
    v1 = Fa + (Fb-Fa)*rand('Uniform');
    v2 = Fa + (Fb-Fa)*rand('Uniform');
    v3 = Fa + (Fb-Fa)*rand('Uniform');
    x = 0*g + quantile('Normal', v1, 0, 1); /* no effect of group on x */
    y = 1*x + quantile('Normal', v2, 0, 1); /* perfect stability between x and y */ 
    gain = y - x; 
    gain = 0*g + quantile('Normal', v3, 0, 1);
    output;
  end;
 end; 
end;
run;
proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model gain = g / ss3;
run;

How to do simulations of gain scores properly? I read the book of Rick Wicklin "Simulating data with SAS" but he doesn't go into simulations of gain scores or the like...

 

Thank you in advance for any help and/or feedback!

10 REPLIES 10
PGStats
Opal | Level 21

Why not:

 

v1 = Fa + (Fb-Fa)*rand('Uniform');
v2 = Fa + (Fb-Fa)*rand('Uniform');
x = 0*g + quantile('Normal', v1, 0, 1);
y = groupEffect*g + quantile('Normal', v2, 0, 1);
gain = y - x;

which would make x and y uncorrelated.

 

PG
Amanda_Lemon
Quartz | Level 8

But I need x and y to be correlated -- their correlation is one of the parameters I want to be able to manipulate...

 

Also, I am interested specifically in the effect of g on gain scores -- not on y...

PGStats
Opal | Level 21

Have you considered using the SIMNORMAL procedure to simulate correlated normal variates?

PG
Amanda_Lemon
Quartz | Level 8

Well, I thought about simulating two correlated variables -- I don't know this particular procedure but I read about some other ways to do it. But will it solve my problem? Because I will have simulated x and y, as I have them now -- and then my gain will be a difference score on the one hand and a regression to specify the null group effect on the other... So, I am not sure how simulating two correlated variables will help me. Am I missing something? It's my first attempt to do simulations so I my knowledge is quite shaky...

Rick_SAS
SAS Super FREQ

I do not fully understand your question, but here are a few thoughts:

1. Be sure that you clearly state the hypothesis under which you are running the simulation. Is it the null hypothesis of no difference between treament and control groups? 5 points difference?  Is your simulation conforming to the model assumptions, or are you trying to violate an assumption to see what happens to the hypothesis tests in the ANCOVA? 

2. In an ANCOVA model, the "correlations" arise because you are testing the same individual pre- and post-test. A typical model is

post[i] = beta_0 + beta_1*Group[i] + beta_2*pre[i] + epsilon[i]

where 

Group[i] = 0 if the i_th individual is in the control group and 

Group[i] = 1 if the i_th individual is in the treatment group.

3. Notice you can also write the regression relationship as 

Change[i] = beta_0 + beta_1*Group[i] + (beta_2 - 1)*pre[i] + epsilon[i]

if the research question is to model the change.

4. Don't worry about the distribution of the covariate (truncated normal) until after you get the simulation working. Similarly, don't worry about the outer loop (SampleID) until you have one simulation working to your satisfaction.

 

As I've said, I don't fully understand your design and assumptions. However, this is how I would simulate a basic pre/post test ANCOVA model:

 

data ANCOVA;
b0 = 0;
b1 = 7;
b2 = 1.1;
call streaminit(12345678);
 do g = 0, 1;      /* 50 control and 50 treatment */
  do i = 1 to 50;
    pre = rand("Normal", 60, 10);
    error = rand("Normal", 0, 3);
    post = b0 + b1*g + b2*pre + error;
    output;
  end;
 end; 
run;

ods graphics on;
proc glm data=ANCOVA;
  class g;
  model post = g pre / ss3;
run;

 

 

 

Amanda_Lemon
Quartz | Level 8

Hi Rick,

 

Thank you so much for your reply!

 

What I am trying to do is essentially to replicate the work of these authors: https://www.researchgate.net/publication/258136260_A_Monte_Carlo_Comparison_Study_of_the_Power_of_th...

 

I am not interested in residual change scores, so I want to replicate ANCOVA and change scores. (Their methods are on pp. 7-8). So, essentially, they manipulated 5 things:

1. Sample size (my SampleID loop)

2. Treatment effect (in ANCOVA: b1 in post = b0 + b1*g + b2*pre + error)

3. Stability (I am not sure but I think it's b2)

4. Baseline imbalance: (in ANCOVA it's b3 in pre = b00 + b3*g + error2)

5. Reliability (I am not sure what that is in terms of what is being manipulated...)

 

I think I am good about ANCOVA. But I am still confused about gain scores (change)... Why did you write (beta_2 - 1)*pre[i]? Why is it b2-1? Gain scores are technically post - pre, so I can't put together what you wrote with the definition of gain scores... 

 

Where I am confused is that gain = post - pre on the one hand, but on the other hand I need to model treatment effect on gain scores, which is gain = beta*g + e...

Rick_SAS
SAS Super FREQ

It sounds like you are not sure how to design the study. I suggest you discuss your issues with your advisor/mentor. Perhaps after you simulate the parts that you know how to do, the other parts will become clearer.  The primary rule of simulation is to explicitly state the statistical hypothesis that you are trying to test. The simulation is often straightforward after you write out the model.

 

Regarding (beta_2 -1), all I did was start with 

post[i] = beta_0 + beta_1*Group[i] + beta_2*pre[i] + epsilon[i]

and then subtract pre[i] from both sides of the equation.

 

 

Amanda_Lemon
Quartz | Level 8
Thank you! I think I figured where my confusion was.
Rick_SAS
SAS Super FREQ

Wonderful! Congratulations!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1987 views
  • 3 likes
  • 4 in conversation