## question about the mixture of two normal distributions

Occasional Contributor
Posts: 17

# question about the mixture of two normal distributions

[ Edited ]
Data generated using random number with Normal distribution (mean=5 std=2) as dataset D1 which has 100 elements.
Data generated using random number with Normal distribution (mean=3 std=3) as dataset D2 which has 80 elements.
Based on D1 and D2, data generated with Normal distribution at 30% Normal (mean=5, std=2)+70% Normal (mean=3, std=3) as D3 which has 120 elements

How to estimate proportion of D1 and D2 in dataset D3?
Super User
Posts: 23,261

## Re: question

Two ways pop to mind:

1.Simulation

2. Theory — look at joint distribution probability and see.

The correct method is likely what you’ve been taught in the course. If it’s theoretical, check your text, this isn’t a SAS question.

If it is simulation, what do you have so far?

I also feel like the question may be missing something...Is this the word for word question from your assignment or have you paraphrased it?

librasantosh wrote:
Data generated using random number with Normal distribution (mean=5 std=2) as dataset D1 which has 100 elements.
Data generated using random number with Normal distribution (mean=3 std=3) as dataset D2 which has 80 elements.
Based on D1 and D2, data generated with Normal distribution at 30% Normal (mean=5, std=2)+70% Normal (mean=3, std=3) as D3 which has 120 elements

How to estimate proportion of D1 and D2 in dataset D3?

Posts: 5,479

## Re: question

[ Edited ]

Proc FMM (Finite Mixture Models) does this kind of estimation.

PG
Occasional Contributor
Posts: 17

SAS Super FREQ
Posts: 4,171

## Re: question

Not clear if you want to do this in SAS or if it is a theoretical question....

If in SAS, it sounds like you are simulating a random sample from a mixture of normal distributions. However, your question seems to imply that the data set that you want is a random mixture of SAMPLES, where the samples are obtained beforehand.

Anyway, if you read the article, you will see that the general technique is

1. Generate random Bernoulli variate:

b = rand("Bern", 0.3);

2. Use the 0/1 value to determine if you should choose a random sample from the first or second distribution.

I think what I would do is use PROC SURVEYSELECT (or, easier, PROC IML) to sample 120 elements from each data set with replacement. Then merge the results and use the above technique to get your simulated sample:

If this is a theoretical question, the answer is to look at the expected value of the proportions. You expect 0.3*120 = 36 observations from D1 and 0.7*120 = 84 observations from D2.Use those values and the sizes of D1 and D2 to answer the question.

Occasional Contributor
Posts: 10

## Re: question

Add me as a friend ?

SAS Super FREQ
Posts: 4,171

## Re: question

You can add me as a friend. That enables you to get notified (if you wish) on my activity, such as when I answer a question.

Occasional Contributor
Posts: 10

## Re: question

Thank you Rick, you're really the best !

Occasional Contributor
Posts: 17

## question on estimating proportions

[ Edited ]

Data generated using random number with Normal distribution (mean=5 std=2) as dataset D1 which has 100 elements.
Data generated using random number with Normal distribution (mean=3 std=3) as dataset D2 which has 80 elements.
Based on D1 and D2, data generated with Normal distribution at 30% Normal (mean=5, std=2)+70% Normal (mean=3, std=3) as D3 which has 120 elements

How to estimate proprotion of D1 and D2 in dataset D3?

If any idea please let me know.

SAS Employee
Posts: 367

## Re: question on estimating proportions

Sounds like a job for PROC FMM. Given data set D3 with response values in Y:

``````proc fmm;
model y = / k=2;
run;
``````
Discussion stats
• 9 replies
• 213 views
• 1 like
• 6 in conversation