turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Generating data from bernoulli distribution

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-01-2016 01:08 PM

Hello Rick. My question is:

What is the difference between generating data from bernoulli distribution such that the probability of the binary response to get value "zero" is p and the probability to get the value "one" is (1-p) and generating data from bernoulli distribution such that the probability of the binary response to get value "one" is p and the probability to get the value "zero" is (1-p)?

Thank you

Accepted Solutions

Solution

08-01-2016
07:37 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-01-2016 03:06 PM

If you use the statement

b = RAND("Bernoulli", p);

then b gets the value 1 with probability p.

If you want b to have the value 0 with probability p, you would use

b = RAND("Bernoulli", 1-p);

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-01-2016 01:25 PM

There is no mathematical difference. In the first case the "event" is 0; in the second case the "event" is 1.

If this is part of simulating data from a logistic regression model, then you can control the value of the event that you are modeling. By default PROC LOGISTIC models the "event" as the first ordered category, which would be 0 for a 0/1 response variable. You can model 1 by using the following MODEL statement:

MODEL y(event='1') = x1 x2 x3 ...;

In a simulation context, if you want the parameter estimates to match the parameters in the simulation, you need to make sure that the event that you are modeling matches the response variable that has probability p.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-01-2016 02:41 PM

You mentioned that:

To simulate logistic data, you need to do the following:

(1) Assign the design matrix (X) of the explanatory variables. This step is done once. It establishes the values of the explanatory variables in the (simulated) study.

(2) Compute the linear predictor, η = X β, where β is a vector of parameters. The parameters are the "true values" of the regression coefficients.

(3) Transform the linear predictor by the logistic (inverse logit) function. The transformed values are in the range (0,1) and represent probabilities for each observation of the explanatory variables.

(4) Simulate a binary response vector from the Bernoulli distribution, where each 0/1 response is randomly generated according to the specified probabilities from Step 3.

**My question is:**

**For step (4),** How we can differentiate between generating data from bernoulli distribution such that the probability of the binary response to get value "zero" is p and the probability to get the value "one" is (1-p) and generating data from bernoulli distribution such that the probability of the binary response to get value "one" is p and the probability to get the value "zero" is (1-p)?

**I mean,** how we can guarantee that we will have data from bernoulli distribution such that the probability of the binary response to get value "zero" is p and the probability to get the value "one" is (1-p)

Thank you

Solution

08-01-2016
07:37 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-01-2016 03:06 PM

If you use the statement

b = RAND("Bernoulli", p);

then b gets the value 1 with probability p.

If you want b to have the value 0 with probability p, you would use

b = RAND("Bernoulli", 1-p);

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-01-2016 07:36 PM

Thaks for your help.

I built confidence intervals for an unknown parameter (coefficient of variation) using different combinations of true parameters (mu and sigma) that are needed in the construction of these confidence intervals using simulation and I got the same coverage probability for these confidence intervals. Is it correct? What is the explantion?

Thank you