question about the mixture of two normal distributions

Reply
Occasional Contributor
Posts: 17

question about the mixture of two normal distributions

[ Edited ]
Data generated using random number with Normal distribution (mean=5 std=2) as dataset D1 which has 100 elements.
Data generated using random number with Normal distribution (mean=3 std=3) as dataset D2 which has 80 elements.
Based on D1 and D2, data generated with Normal distribution at 30% Normal (mean=5, std=2)+70% Normal (mean=3, std=3) as D3 which has 120 elements 
 
How to estimate proportion of D1 and D2 in dataset D3?
Super User
Posts: 21,464

Re: question

Posted in reply to librasantosh

Two ways pop to mind:

 

1.Simulation

2. Theory — look at joint distribution probability and see. 

 

The correct method is likely what you’ve been taught in the course. If it’s theoretical, check your text, this isn’t a SAS question. 

If it is simulation, what do you have so far?  

 

I also feel like the question may be missing something...Is this the word for word question from your assignment or have you paraphrased it? 

 


librasantosh wrote:
Data generated using random number with Normal distribution (mean=5 std=2) as dataset D1 which has 100 elements.
Data generated using random number with Normal distribution (mean=3 std=3) as dataset D2 which has 80 elements.
Based on D1 and D2, data generated with Normal distribution at 30% Normal (mean=5, std=2)+70% Normal (mean=3, std=3) as D3 which has 120 elements 
 
How to estimate proportion of D1 and D2 in dataset D3?

PS asking the same question multiple times is unhelpful. 

Esteemed Advisor
Posts: 5,113

Re: question

[ Edited ]
Posted in reply to librasantosh

Proc FMM (Finite Mixture Models) does this kind of estimation.

PG
Occasional Contributor
Posts: 17

Re: question

thanks for reply

SAS Super FREQ
Posts: 3,900

Re: question

Posted in reply to librasantosh

Not clear if you want to do this in SAS or if it is a theoretical question....

 

If in SAS, it sounds like you are simulating a random sample from a mixture of normal distributions. However, your question seems to imply that the data set that you want is a random mixture of SAMPLES, where the samples are obtained beforehand.

 

Anyway, if you read the article, you will see that the general technique is 

1. Generate random Bernoulli variate:

    b = rand("Bern", 0.3);

2. Use the 0/1 value to determine if you should choose a random sample from the first or second distribution.

 

I think what I would do is use PROC SURVEYSELECT (or, easier, PROC IML) to sample 120 elements from each data set with replacement. Then merge the results and use the above technique to get your simulated sample:

 

 

If this is a theoretical question, the answer is to look at the expected value of the proportions. You expect 0.3*120 = 36 observations from D1 and 0.7*120 = 84 observations from D2.Use those values and the sizes of D1 and D2 to answer the question.

Occasional Contributor
Posts: 10

Re: question

Congratulations Rick_Sas for all your badges !

Add me as a friend ? Heart

SAS Super FREQ
Posts: 3,900

Re: question

You can add me as a friend. That enables you to get notified (if you wish) on my activity, such as when I answer a question. 

Occasional Contributor
Posts: 10

Re: question

Thank you Rick, you're really the best ! Heart

 

Occasional Contributor
Posts: 17

question on estimating proportions

[ Edited ]
Posted in reply to librasantosh

 

Data generated using random number with Normal distribution (mean=5 std=2) as dataset D1 which has 100 elements.
Data generated using random number with Normal distribution (mean=3 std=3) as dataset D2 which has 80 elements.
Based on D1 and D2, data generated with Normal distribution at 30% Normal (mean=5, std=2)+70% Normal (mean=3, std=3) as D3 which has 120 elements

How to estimate proprotion of D1 and D2 in dataset D3?

If any idea please let me know.

SAS Employee
Posts: 319

Re: question on estimating proportions

Posted in reply to librasantosh

Sounds like a job for PROC FMM. Given data set D3 with response values in Y:

proc fmm;
model y = / k=2;
run;
Ask a Question
Discussion stats
  • 9 replies
  • 185 views
  • 1 like
  • 6 in conversation