turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- simulating a sample from an empirical distribution

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 04:35 PM

Hi,

suppose that I have a population of 1000 objects categorized into 3 different types and that 50% of the total objects are Type1, 30% are Type2 and the remaining 20% are Type3.

What I would like to do is to simulate a sample of 100 objects and randomly classify them into 3 types, and the types having the distribution of the population described above.

I found how to simulate samples with embedded distributions, but have difficulty with the empirical distribution.

Thank you!

Accepted Solutions

Solution

05-21-2015
10:45 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-21-2015 10:45 AM - edited 09-08-2015 03:29 PM

This is exactly the example that I use on p. 18-19 of my book *Simulating Data with SAS*. The code is

```
%let N = 100;
data Table(keep=x);
call streaminit(4321);
array p[3] _temporary_ (0.5 0.2 0.3); /* proportions in population */
do i = 1 to &N;
x = rand("Table", of p[*]); /* sample with replacement */
output;
end;
run;
```

For an explanation and alternatives, see the blog post "Simulate categorical data in SAS."

By the way, the size of the population doesn't matter. Only the proportions.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 04:45 PM

If you have a category variable then sort by the category variable and:

proc survey select data=have out=want

sampsize=100;

strata category/ alloc=prop;

run;

might work

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 06:21 PM

That will give a sample with *exactly* the same distribution (exactly 50 that are Type 1 etc). If you want to randomly sample, then you also need to know whether you are sampling from a population of size 1000 (without replacement) or from an infinite population.

Briefly, these could be achieved by:

Population size=1000: assign a random number to each observation, sort and take the first 100.

Population size = infinite: generate a set of 100 random numbers between 1 and 1000. Merge this (by a variable containing observation number) with your data set of 1000 keeping those that were in your random number list (may include duplicates, because it is sampling with replacement).

Solution

05-21-2015
10:45 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-21-2015 10:45 AM - edited 09-08-2015 03:29 PM

This is exactly the example that I use on p. 18-19 of my book *Simulating Data with SAS*. The code is

```
%let N = 100;
data Table(keep=x);
call streaminit(4321);
array p[3] _temporary_ (0.5 0.2 0.3); /* proportions in population */
do i = 1 to &N;
x = rand("Table", of p[*]); /* sample with replacement */
output;
end;
run;
```

For an explanation and alternatives, see the blog post "Simulate categorical data in SAS."

By the way, the size of the population doesn't matter. Only the proportions.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-21-2015 11:25 AM

omg I actually bought this book last year and even read section 2.4.5 Tabulated Distributions with the socks example (I even highlighted parts of the text)!!!!

Its just that since then I completely forgot about it...

Thanks Rick!!!