turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Saving p-values

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-22-2017 11:39 PM

Hi,

I am trying to do a Monte Carlo simulation of ANCOVA. I generated random variables and simulated data from regressions models below. x is a covariate, g is a grouping variable (0 for a control group, 1 for a treatment group), y is a dependent variable. I am manipulating coefficients. In this syntax, I set the treatment effect on y to be 0, the treatment effect on x to be 0.6, and the effect of x on y to be 1. I also set the number of samples to be 1000 and the sample size to be 800 (about a half being a treatment group).

```
data MC (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
do i = 1 to 800;
g = rand("Bernoulli", 0.5);
x = 0.6*g + rand("Normal");
y = 1*x + 0*g + rand("Normal");
output;
end;
end;
run;
```

So, after that, I need to test the effect of g on y in each of 1000 samples. I wrote this syntax:

```
proc glm data = MC;
BY sampleID;
class g;
model y = g;
run;
```

It works but the problem is that I need only p values (for F values) from the output to count how many were less then .05 (which will serve as an estimate of Type I error rate). So, I can't figure out how to put these p-values in a data set, an array, or something like that (extracting p-values by hand for 1000 samples is not an option...).

I would very much appreciate any help and/or feedback. Thank you.

Accepted Solutions

Solution

12-23-2017
01:21 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Amanda_Lemon

12-23-2017 01:03 AM

Here is how to do the counting (for effect g):

```
proc glm data = MC outstat=myStat noprint;
BY sampleID;
class g;
model y = g x / ss3;
run;
proc sql;
select
sum(prob<0.05) as nb_5pct,
count(prob) as nb_sample,
sum(prob<0.05) / count(prob) as prop_5pct
from myStat
where _SOURCE_ = "g";
quit;
```

PG

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Amanda_Lemon

12-22-2017 11:55 PM

Start with:

```
proc glm data = MC outstat=myStat noprint;
BY sampleID;
class g;
model y = g x / ss3;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

12-23-2017 12:06 AM - edited 12-23-2017 12:09 AM

Thank you! Now I have one table with all p values!

Last question -- how now to count p that less than 0.05? I though I can use myStat now as a data set and PROB as a variable so that I could simply count through the if statement -- but that doesn't seem to work...

Thank you in advance.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Amanda_Lemon

12-23-2017 12:24 AM

Amanda_Lemon wrote:

Last question -- how now to count p that less than 0.05? I though I can use myStat now as a data set and PROB as a variable so that I could simply count through the if statement -- but that doesn't seem to work...

Thank you in advance.

What does that code look like, that logic should work fine.

Solution

12-23-2017
01:21 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Amanda_Lemon

12-23-2017 01:03 AM

Here is how to do the counting (for effect g):

```
proc glm data = MC outstat=myStat noprint;
BY sampleID;
class g;
model y = g x / ss3;
run;
proc sql;
select
sum(prob<0.05) as nb_5pct,
count(prob) as nb_sample,
sum(prob<0.05) / count(prob) as prop_5pct
from myStat
where _SOURCE_ = "g";
quit;
```

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

12-23-2017 01:22 AM

Thank you so much!! It works!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Amanda_Lemon

12-23-2017 02:04 AM

An unrelated follow-up question. When I simulate the binary variable g, I use the Bernoulli distribution. I want the two groups to be equal, so I thought that setting the probability to 0.5 will do that. But it produces approximate splits in half (like, 423 and 377). I know I can just assign 0 to g for i from 1 to 400 and 1 to g for i from 401 to 800. But is there a way to _randomly_ assign 0 and 1 to g but have the groups precisely equal (400 and 400)?

Thanks again for your help.

```
data MC (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
do i = 1 to 800;
g = rand("Bernoulli", 0.5);
x = 0.6*g + rand("Normal");
y = 1*x + 0*g + rand("Normal");
output;
end;
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Amanda_Lemon

12-23-2017 02:22 AM

There is no advantage in having random g values, as far as I can tell. But if you want, you can do:

```
data temp (drop=i);
call streaminit(12345678);
do sampleID = 1 to 1000;
do g = 0, 1;
do i = 1 to 400;
rnd = rand("Uniform");
x = 0.6*g + rand("Normal");
y = 1*x + 0*g + rand("Normal");
output;
end;
end;
end;
run;
proc sort data=temp out=mc(drop=rnd);
by sampleId rnd;
run;
```

(untested)

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

12-23-2017 02:41 AM

That works! Thanks again!!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

12-23-2017 05:37 PM

I ran into another problem... Everything works just fine when I simulate a normal distribution but when I changed it to a truncated normal distribution, the PROC GLM statement stopped working... (data are simulated fine). When I print the GLM results, it shows that SS for g were not computed (see the screenshot). Why is that? Thank you in advance.

```
data MC (keep = x y g sampleID);
Fa = cdf('Normal', -6, 0, 1); /* for a = -6 */
Fb = cdf('Normal', 6, 0, 1); /* for b = 6 */
call streaminit(12345678);
do sampleID = 1 to 1000;
do g = 0, 1;
do i = 1 to 500;
v = Fa + (Fb-Fa)*rand('Uniform'); /* V ~ U(F(a), F(b)) */
x = 0*g + quantile('Normal', v, 0, 1); /* truncated normal on [a,b] */
y = 1*x + 0*g + quantile('Normal', v, 0, 1);
output;
end;
end;
end;
run;
proc glm data = MC outstat=myStat noprint;
BY sampleID;
class g;
model y = g x / ss3;
run;
proc print data = myStat;
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Amanda_Lemon

12-23-2017 11:12 PM

The problem stems from the fact that *y = 2*x* in all of your simulated observations. To replicate the structure of your previous simulation you would need:

```
v = Fa + (Fb-Fa)*rand('Uniform'); /* V ~ U(F(a), F(b)) */
w = Fa + (Fb-Fa)*rand('Uniform'); /* W ~ U(F(a), F(b)) */
x = 0*g + quantile('Normal', v, 0, 1); /* truncated normal on [a,b] */
y = 1*x + 0*g + quantile('Normal', w, 0, 1);
```

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

12-25-2017 01:20 AM

Thank you so much!! It works!