BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Amanda_Lemon
Quartz | Level 8

Hi,

 

I am trying to do a Monte Carlo simulation of ANCOVA. I generated random variables and simulated data from regressions models below. x is a covariate, g is a grouping variable (0 for a control group, 1 for a treatment group), y is a dependent variable. I am manipulating coefficients. In this syntax, I set the treatment effect on y to be 0, the treatment effect on x to be 0.6, and the effect of x on y to be 1. I also set the number of samples to be 1000 and the sample size to be 800 (about a half being a treatment group).

 

data MC (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
  do i = 1 to 800;
    g = rand("Bernoulli", 0.5);
    x = 0.6*g + rand("Normal"); 
    y = 1*x + 0*g + rand("Normal");
    output;
  end;
end;
run;

So, after that, I need to test the effect of g on y in each of 1000 samples. I wrote this syntax:

proc glm data = MC;
  BY sampleID; 
  class g;
  model y = g;
run;

It works but the problem is that I need only p values (for F values) from the output to count how many were less then .05 (which will serve as an estimate of Type I error rate). So, I can't figure out how to put these p-values in a data set, an array, or something like that (extracting p-values by hand for 1000 samples is not an option...).

 

I would very much appreciate any help and/or feedback. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Here is how to do the counting (for effect g):

 


proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;

proc sql;
select 
    sum(prob<0.05) as nb_5pct,
    count(prob) as nb_sample,
    sum(prob<0.05) / count(prob) as prop_5pct
from myStat
where _SOURCE_ = "g";
quit;

PG

View solution in original post

11 REPLIES 11
PGStats
Opal | Level 21

Start with:

 


proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;
PG
Amanda_Lemon
Quartz | Level 8

Thank you! Now I have one table with all p values!

Last question -- how now to count p that less than 0.05? I though I can use myStat now as a data set and PROB as a variable so that I could simply count through the if statement -- but that doesn't seem to work...

Thank you in advance.

Reeza
Super User

@Amanda_Lemon wrote:

Thank you! Now I have one table with all p values!

Last question -- how now to count p that less than 0.05? I though I can use myStat now as a data set and PROB as a variable so that I could simply count through the if statement -- but that doesn't seem to work...

Thank you in advance.


What does that code look like, that logic should work fine. 

PGStats
Opal | Level 21

Here is how to do the counting (for effect g):

 


proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;

proc sql;
select 
    sum(prob<0.05) as nb_5pct,
    count(prob) as nb_sample,
    sum(prob<0.05) / count(prob) as prop_5pct
from myStat
where _SOURCE_ = "g";
quit;

PG
Amanda_Lemon
Quartz | Level 8

Thank you so much!! It works!

Amanda_Lemon
Quartz | Level 8

An unrelated follow-up question. When I simulate the binary variable g, I use the Bernoulli distribution. I want the two groups to be equal, so I thought that setting the probability to 0.5 will do that. But it produces approximate splits in half (like, 423 and 377). I know I can just assign 0 to g for i from 1 to 400 and 1 to g for i from 401 to 800. But is there a way to _randomly_ assign 0 and 1 to g but have the groups precisely equal (400 and 400)?

 

Thanks again for your help.

 

data MC (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
  do i = 1 to 800;
    g = rand("Bernoulli", 0.5);
    x = 0.6*g + rand("Normal"); 
    y = 1*x + 0*g + rand("Normal");
    output;
  end;
end;
run;
PGStats
Opal | Level 21

There is no advantage in having random g values, as far as I can tell. But if you want, you can do:

 

data temp (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
    do g = 0, 1;
        do i = 1 to 400;
            rnd = rand("Uniform");
            x = 0.6*g + rand("Normal"); 
            y = 1*x + 0*g + rand("Normal");
            output;
            end;
        end;
    end;
run;

proc sort data=temp out=mc(drop=rnd); 
by sampleId rnd; 
run;

(untested)

 

PG
Amanda_Lemon
Quartz | Level 8
That works! Thanks again!!
Amanda_Lemon
Quartz | Level 8

I ran into another problem... Everything works just fine when I simulate a normal distribution but when I changed it to a truncated normal distribution, the PROC GLM statement stopped working... (data are simulated fine). When I print the GLM results, it shows that SS for g were not computed (see the screenshot). Why is that? Thank you in advance.

 

data MC (keep = x y g sampleID);
Fa = cdf('Normal', -6, 0, 1); /* for a = -6 */
Fb = cdf('Normal', 6, 0, 1); /* for b = 6 */
call streaminit(12345678);
do sampleID = 1 to 1000;
 do g = 0, 1;
  do i = 1 to 500;
    v = Fa + (Fb-Fa)*rand('Uniform'); /* V ~ U(F(a), F(b)) */
    x = 0*g + quantile('Normal', v, 0, 1); /* truncated normal on [a,b] */
    y = 1*x + 0*g + quantile('Normal', v, 0, 1);
    output;
  end;
 end; 
end;
run;
proc glm data = MC outstat=myStat noprint;
  BY sampleID; 
  class g;
  model y = g x / ss3;
run;
proc print data = myStat; 
run;

print .jpg

PGStats
Opal | Level 21

The problem stems from the fact that y = 2*x in all of your simulated observations. To replicate the structure of your previous simulation you would need:

 

    v = Fa + (Fb-Fa)*rand('Uniform'); /* V ~ U(F(a), F(b)) */
    w = Fa + (Fb-Fa)*rand('Uniform'); /* W ~ U(F(a), F(b)) */
    x = 0*g + quantile('Normal', v, 0, 1); /* truncated normal on [a,b] */
    y = 1*x + 0*g + quantile('Normal', w, 0, 1);
PG
Amanda_Lemon
Quartz | Level 8
Thank you so much!! It works!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2795 views
  • 1 like
  • 3 in conversation