Re: Simulating gain scores

Amanda_Lemon · Posted 12-26-2017 06:01 PM

Hi everyone, I am learning how to do simulations in SAS for different methods of analyzing a two-group pretest-posttest design.

I simulated ANCOVA successfully (I think) -- see the code below (this code uses a truncated normal distribution). But now I want to simulate gain scores (y-x), and I am confused... In ANCOVA, I just simulated y under particular conditions, and then tested the model through proc glm.

Data MC (keep = x y g sampleID);
Fa = cdf('Normal', -6, 0, 1);
Fb = cdf('Normal', 6, 0, 1); 
call streaminit(12345678);
do sampleID = 1 to 1000;
 do g = 0, 1;
  do i = 1 to 500;
    v1 = Fa + (Fb-Fa)*rand('Uniform');
    v2 = Fa + (Fb-Fa)*rand('Uniform');
    x = 0*g + quantile('Normal', v1, 0, 1); /* no effect of group on x */
    y = 1*x + 0*g + quantile('Normal', v2, 0, 1); /* perfect stability between x and y; no effect of g on y */ 
    output;
  end;
 end; 
end;
run;
proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;

But with gain scores this will not work because on the one hand gain = x-y but on the other hand I need to simulate gain to specify no effect of the group (see the attempt below...). But that's not right as I can't have gain to equal to two different expressions...

Data MC (keep = x y g sampleID);
Fa = cdf('Normal', -6, 0, 1);
Fb = cdf('Normal', 6, 0, 1); 
call streaminit(12345678);
do sampleID = 1 to 1000;
 do g = 0, 1;
  do i = 1 to 500;
    v1 = Fa + (Fb-Fa)*rand('Uniform');
    v2 = Fa + (Fb-Fa)*rand('Uniform');
    v3 = Fa + (Fb-Fa)*rand('Uniform');
    x = 0*g + quantile('Normal', v1, 0, 1); /* no effect of group on x */
    y = 1*x + quantile('Normal', v2, 0, 1); /* perfect stability between x and y */ 
    gain = y - x; 
    gain = 0*g + quantile('Normal', v3, 0, 1);
    output;
  end;
 end; 
end;
run;
proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model gain = g / ss3;
run;

How to do simulations of gain scores properly? I read the book of Rick Wicklin "Simulating data with SAS" but he doesn't go into simulations of gain scores or the like...

Thank you in advance for any help and/or feedback!

PGStats · Posted 12-26-2017 06:37 PM

Why not:

v1 = Fa + (Fb-Fa)*rand('Uniform');
v2 = Fa + (Fb-Fa)*rand('Uniform');
x = 0*g + quantile('Normal', v1, 0, 1);
y = groupEffect*g + quantile('Normal', v2, 0, 1);
gain = y - x;

which would make x and y uncorrelated.

PG

Amanda_Lemon · Posted 12-26-2017 07:02 PM

But I need x and y to be correlated -- their correlation is one of the parameters I want to be able to manipulate...

Also, I am interested specifically in the effect of g on gain scores -- not on y...

PGStats · Posted 12-26-2017 11:54 PM

Have you considered using the SIMNORMAL procedure to simulate correlated normal variates?

PG

Amanda_Lemon · Posted 12-27-2017 01:00 AM

Well, I thought about simulating two correlated variables -- I don't know this particular procedure but I read about some other ways to do it. But will it solve my problem? Because I will have simulated x and y, as I have them now -- and then my gain will be a difference score on the one hand and a regression to specify the null group effect on the other... So, I am not sure how simulating two correlated variables will help me. Am I missing something? It's my first attempt to do simulations so I my knowledge is quite shaky...

Ksharp · Posted 12-27-2017 07:22 AM

Why not calling @Rick_SAS

Rick_SAS · Posted 12-29-2017 07:22 PM

I do not fully understand your question, but here are a few thoughts:

1. Be sure that you clearly state the hypothesis under which you are running the simulation. Is it the null hypothesis of no difference between treament and control groups? 5 points difference? Is your simulation conforming to the model assumptions, or are you trying to violate an assumption to see what happens to the hypothesis tests in the ANCOVA?

2. In an ANCOVA model, the "correlations" arise because you are testing the same individual pre- and post-test. A typical model is

post[i] = beta_0 + beta_1*Group[i] + beta_2*pre[i] + epsilon[i]

where

Group[i] = 0 if the i_th individual is in the control group and

Group[i] = 1 if the i_th individual is in the treatment group.

3. Notice you can also write the regression relationship as

Change[i] = beta_0 + beta_1*Group[i] + (beta_2 - 1)*pre[i] + epsilon[i]

if the research question is to model the change.

4. Don't worry about the distribution of the covariate (truncated normal) until after you get the simulation working. Similarly, don't worry about the outer loop (SampleID) until you have one simulation working to your satisfaction.

As I've said, I don't fully understand your design and assumptions. However, this is how I would simulate a basic pre/post test ANCOVA model:

data ANCOVA;
b0 = 0;
b1 = 7;
b2 = 1.1;
call streaminit(12345678);
 do g = 0, 1;      /* 50 control and 50 treatment */
  do i = 1 to 50;
    pre = rand("Normal", 60, 10);
    error = rand("Normal", 0, 3);
    post = b0 + b1*g + b2*pre + error;
    output;
  end;
 end; 
run;

ods graphics on;
proc glm data=ANCOVA;
  class g;
  model post = g pre / ss3;
run;

Amanda_Lemon · Posted 12-30-2017 05:03 PM

Hi Rick,

Thank you so much for your reply!

What I am trying to do is essentially to replicate the work of these authors: https://www.researchgate.net/publication/258136260_A_Monte_Carlo_Comparison_Study_of_the_Power_of_th...

I am not interested in residual change scores, so I want to replicate ANCOVA and change scores. (Their methods are on pp. 7-8). So, essentially, they manipulated 5 things:

1. Sample size (my SampleID loop)

2. Treatment effect (in ANCOVA: b1 in post = b0 + b1*g + b2*pre + error)

3. Stability (I am not sure but I think it's b2)

4. Baseline imbalance: (in ANCOVA it's b3 in pre = b00 + b3*g + error2)

5. Reliability (I am not sure what that is in terms of what is being manipulated...)

I think I am good about ANCOVA. But I am still confused about gain scores (change)... Why did you write (beta_2 - 1)*pre[i]? Why is it b2-1? Gain scores are technically post - pre, so I can't put together what you wrote with the definition of gain scores...

Where I am confused is that gain = post - pre on the one hand, but on the other hand I need to model treatment effect on gain scores, which is gain = beta*g + e...

Rick_SAS · Posted 12-31-2017 07:19 AM

It sounds like you are not sure how to design the study. I suggest you discuss your issues with your advisor/mentor. Perhaps after you simulate the parts that you know how to do, the other parts will become clearer. The primary rule of simulation is to explicitly state the statistical hypothesis that you are trying to test. The simulation is often straightforward after you write out the model.

Regarding (beta_2 -1), all I did was start with

post[i] = beta_0 + beta_1*Group[i] + beta_2*pre[i] + epsilon[i]

and then subtract pre[i] from both sides of the equation.

Amanda_Lemon · Posted 01-09-2018 07:36 PM

Thank you! I think I figured where my confusion was.

Rick_SAS · Posted 01-10-2018 05:33 AM

Wonderful! Congratulations!