Solved: Re: Saving p-values

Amanda_Lemon · Posted 12-22-2017 11:39 PM

Hi,

I am trying to do a Monte Carlo simulation of ANCOVA. I generated random variables and simulated data from regressions models below. x is a covariate, g is a grouping variable (0 for a control group, 1 for a treatment group), y is a dependent variable. I am manipulating coefficients. In this syntax, I set the treatment effect on y to be 0, the treatment effect on x to be 0.6, and the effect of x on y to be 1. I also set the number of samples to be 1000 and the sample size to be 800 (about a half being a treatment group).

data MC (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
  do i = 1 to 800;
    g = rand("Bernoulli", 0.5);
    x = 0.6*g + rand("Normal"); 
    y = 1*x + 0*g + rand("Normal");
    output;
  end;
end;
run;

So, after that, I need to test the effect of g on y in each of 1000 samples. I wrote this syntax:

proc glm data = MC;
  BY sampleID; 
  class g;
  model y = g;
run;

It works but the problem is that I need only p values (for F values) from the output to count how many were less then .05 (which will serve as an estimate of Type I error rate). So, I can't figure out how to put these p-values in a data set, an array, or something like that (extracting p-values by hand for 1000 samples is not an option...).

I would very much appreciate any help and/or feedback. Thank you.

PGStats · Posted 12-23-2017 01:03 AM

Here is how to do the counting (for effect g):


proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;

proc sql;
select 
    sum(prob<0.05) as nb_5pct,
    count(prob) as nb_sample,
    sum(prob<0.05) / count(prob) as prop_5pct
from myStat
where _SOURCE_ = "g";
quit;

PG

View solution in original post

PGStats · Posted 12-22-2017 11:55 PM

Start with:


proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;

PG

Amanda_Lemon · Posted 12-23-2017 12:06 AM

Thank you! Now I have one table with all p values!

Last question -- how now to count p that less than 0.05? I though I can use myStat now as a data set and PROB as a variable so that I could simply count through the if statement -- but that doesn't seem to work...

Thank you in advance.

Reeza · Posted 12-23-2017 12:24 AM

@Amanda_Lemon wrote:

Thank you! Now I have one table with all p values!

Last question -- how now to count p that less than 0.05? I though I can use myStat now as a data set and PROB as a variable so that I could simply count through the if statement -- but that doesn't seem to work...

Thank you in advance.

What does that code look like, that logic should work fine.

PGStats · Posted 12-23-2017 01:03 AM

Here is how to do the counting (for effect g):


proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;

proc sql;
select 
    sum(prob<0.05) as nb_5pct,
    count(prob) as nb_sample,
    sum(prob<0.05) / count(prob) as prop_5pct
from myStat
where _SOURCE_ = "g";
quit;

PG

Amanda_Lemon · Posted 12-23-2017 01:22 AM

Thank you so much!! It works!

Amanda_Lemon · Posted 12-23-2017 02:04 AM

An unrelated follow-up question. When I simulate the binary variable g, I use the Bernoulli distribution. I want the two groups to be equal, so I thought that setting the probability to 0.5 will do that. But it produces approximate splits in half (like, 423 and 377). I know I can just assign 0 to g for i from 1 to 400 and 1 to g for i from 401 to 800. But is there a way to _randomly_ assign 0 and 1 to g but have the groups precisely equal (400 and 400)?

Thanks again for your help.

data MC (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
  do i = 1 to 800;
    g = rand("Bernoulli", 0.5);
    x = 0.6*g + rand("Normal"); 
    y = 1*x + 0*g + rand("Normal");
    output;
  end;
end;
run;

PGStats · Posted 12-23-2017 02:22 AM

There is no advantage in having random g values, as far as I can tell. But if you want, you can do:

data temp (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
    do g = 0, 1;
        do i = 1 to 400;
            rnd = rand("Uniform");
            x = 0.6*g + rand("Normal"); 
            y = 1*x + 0*g + rand("Normal");
            output;
            end;
        end;
    end;
run;

proc sort data=temp out=mc(drop=rnd); 
by sampleId rnd; 
run;

(untested)

PG

Amanda_Lemon · Posted 12-23-2017 02:41 AM

That works! Thanks again!!

Amanda_Lemon · Posted 12-23-2017 05:37 PM

I ran into another problem... Everything works just fine when I simulate a normal distribution but when I changed it to a truncated normal distribution, the PROC GLM statement stopped working... (data are simulated fine). When I print the GLM results, it shows that SS for g were not computed (see the screenshot). Why is that? Thank you in advance.

data MC (keep = x y g sampleID);
Fa = cdf('Normal', -6, 0, 1); /* for a = -6 */
Fb = cdf('Normal', 6, 0, 1); /* for b = 6 */
call streaminit(12345678);
do sampleID = 1 to 1000;
 do g = 0, 1;
  do i = 1 to 500;
    v = Fa + (Fb-Fa)*rand('Uniform'); /* V ~ U(F(a), F(b)) */
    x = 0*g + quantile('Normal', v, 0, 1); /* truncated normal on [a,b] */
    y = 1*x + 0*g + quantile('Normal', v, 0, 1);
    output;
  end;
 end; 
end;
run;
proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;
proc print data = myStat; 
run;

PGStats · Posted 12-23-2017 11:12 PM

The problem stems from the fact that y = 2*x in all of your simulated observations. To replicate the structure of your previous simulation you would need:

    v = Fa + (Fb-Fa)*rand('Uniform'); /* V ~ U(F(a), F(b)) */
    w = Fa + (Fb-Fa)*rand('Uniform'); /* W ~ U(F(a), F(b)) */
    x = 0*g + quantile('Normal', v, 0, 1); /* truncated normal on [a,b] */
    y = 1*x + 0*g + quantile('Normal', w, 0, 1);

PG

Amanda_Lemon · Posted 12-25-2017 01:20 AM

Thank you so much!! It works!

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away