Programming the statistical procedures from SAS

simulating a sample from an empirical distribution

Accepted Solution Solved
Reply
Super Contributor
Posts: 412
Accepted Solution

simulating a sample from an empirical distribution

Hi,

suppose that I have a population of 1000 objects categorized into 3 different types and that 50% of the total objects are Type1, 30% are Type2 and the remaining 20% are Type3.

What I would like to do is to simulate a sample of 100 objects and randomly classify them into 3 types, and the types having the distribution of the population described above.

I found how to simulate samples with embedded distributions, but have difficulty with the empirical distribution.

Thank you!


Accepted Solutions
Solution
‎05-21-2015 10:45 AM
SAS Super FREQ
Posts: 3,310

Re: simulating a sample from an empirical distribution

[ Edited ]

This is exactly the example that I use on p. 18-19 of my book Simulating Data with SAS.  The code is

%let N = 100;
data Table(keep=x);
call streaminit(4321);
array p[3] _temporary_ (0.5 0.2 0.3);  /* proportions in population */
do i = 1 to &N;
   x = rand("Table", of p[*]);           /* sample with replacement */
   output;
end;
run;

 

 

 

 

For an explanation and alternatives, see the blog post "Simulate categorical data in SAS."

By the way, the size of the population doesn't matter. Only the proportions.

View solution in original post


All Replies
Grand Advisor
Posts: 10,052

Re: simulating a sample from an empirical distribution

If you have a category variable then sort by the category variable and:

proc survey select data=have out=want

     sampsize=100;

     strata category/ alloc=prop;

run;

might work

Occasional Contributor
Posts: 7

Re: simulating a sample from an empirical distribution

That will give a sample with exactly the same distribution (exactly 50 that are Type 1 etc). If you want to randomly sample, then you also need to know whether you are sampling from a population of size 1000 (without replacement) or from an infinite population.

Briefly, these could be achieved by:

Population size=1000: assign a random number to each observation, sort and take the first 100.

Population size = infinite: generate a set of 100 random numbers between 1 and 1000. Merge this (by a variable containing observation number) with your data set of 1000 keeping those that were in your random number list (may include duplicates, because it is sampling with replacement).

Solution
‎05-21-2015 10:45 AM
SAS Super FREQ
Posts: 3,310

Re: simulating a sample from an empirical distribution

[ Edited ]

This is exactly the example that I use on p. 18-19 of my book Simulating Data with SAS.  The code is

%let N = 100;
data Table(keep=x);
call streaminit(4321);
array p[3] _temporary_ (0.5 0.2 0.3);  /* proportions in population */
do i = 1 to &N;
   x = rand("Table", of p[*]);           /* sample with replacement */
   output;
end;
run;

 

 

 

 

For an explanation and alternatives, see the blog post "Simulate categorical data in SAS."

By the way, the size of the population doesn't matter. Only the proportions.

Super Contributor
Posts: 412

Re: simulating a sample from an empirical distribution

omg I actually bought this book last year and even read section 2.4.5 Tabulated Distributions with the socks example (I even highlighted parts of the text)!!!!

Its just that since then I completely forgot about it...

Thanks Rick!!!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 628 views
  • 7 likes
  • 4 in conversation