BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

suppose that I have a population of 1000 objects categorized into 3 different types and that 50% of the total objects are Type1, 30% are Type2 and the remaining 20% are Type3.

What I would like to do is to simulate a sample of 100 objects and randomly classify them into 3 types, and the types having the distribution of the population described above.

I found how to simulate samples with embedded distributions, but have difficulty with the empirical distribution.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

This is exactly the example that I use on p. 18-19 of my book Simulating Data with SAS.  The code is

%let N = 100;
data Table(keep=x);
call streaminit(4321);
array p[3] _temporary_ (0.5 0.2 0.3);  /* proportions in population */
do i = 1 to &N;
   x = rand("Table", of p[*]);           /* sample with replacement */
   output;
end;
run;

 

 

 

 

For an explanation and alternatives, see the blog post "Simulate categorical data in SAS."

By the way, the size of the population doesn't matter. Only the proportions.

View solution in original post

4 REPLIES 4
ballardw
Super User

If you have a category variable then sort by the category variable and:

proc survey select data=have out=want

     sampsize=100;

     strata category/ alloc=prop;

run;

might work

KenDodds
Calcite | Level 5

That will give a sample with exactly the same distribution (exactly 50 that are Type 1 etc). If you want to randomly sample, then you also need to know whether you are sampling from a population of size 1000 (without replacement) or from an infinite population.

Briefly, these could be achieved by:

Population size=1000: assign a random number to each observation, sort and take the first 100.

Population size = infinite: generate a set of 100 random numbers between 1 and 1000. Merge this (by a variable containing observation number) with your data set of 1000 keeping those that were in your random number list (may include duplicates, because it is sampling with replacement).

Rick_SAS
SAS Super FREQ

This is exactly the example that I use on p. 18-19 of my book Simulating Data with SAS.  The code is

%let N = 100;
data Table(keep=x);
call streaminit(4321);
array p[3] _temporary_ (0.5 0.2 0.3);  /* proportions in population */
do i = 1 to &N;
   x = rand("Table", of p[*]);           /* sample with replacement */
   output;
end;
run;

 

 

 

 

For an explanation and alternatives, see the blog post "Simulate categorical data in SAS."

By the way, the size of the population doesn't matter. Only the proportions.

ilikesas
Barite | Level 11

omg I actually bought this book last year and even read section 2.4.5 Tabulated Distributions with the socks example (I even highlighted parts of the text)!!!!

Its just that since then I completely forgot about it...

Thanks Rick!!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2566 views
  • 7 likes
  • 4 in conversation