turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Monte Carlo / Random number

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 10:25 AM

This is the problem, I have a list of numbers and I want to randomly put them into three buckets. This will be a no replacement. I want 50% of the numbers to go in bucket A, 30% to go into bucket B, and 20% to go in bucket C. From this I want to know the mean of each bucket, then run it 100 times and have a sheet that says the mean of each bucket for each run. I have no idea where to start. Please help.

Thank you!

Accepted Solutions

Solution

02-05-2015
03:18 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 03:18 PM

I'm using SAS 9.4. This would certainly work in 9.2 :

**data test;**

**call streaminit(876857);**

**do id = 1 to 64;**

** x = rand("NORMAL");**

** output;**

** end;**

**run;**

**proc sql;**

**select **

** round(0.5*count(*)) as n50, **

** round(0.3*count(*)) as n30**

**into :n50, :n30**

**from test;**

**quit;**

**data sample;**

**set test;**

**do replicate = 1 to 100;**

** rnd = rand("UNIFORM");**

** output;**

** end;**

**run;**

**proc sort data=sample; by replicate rnd; run;**

**data sampleGroups;**

**set sample; by replicate;**

**if first.replicate then n = 0;**

**n + 1;**

**if n <= &n50 then bucket = "A";**

**else if n <= &n50 + &n30 then bucket = "B";**

**else bucket = "C";**

**drop n rnd;**

**run;**

**proc sql;**

**create table stats as**

**select replicate, bucket, mean(x) as meanX**

**from sampleGroups**

**group by replicate, bucket;**

**quit;**

PG

PG

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 10:44 AM

Homework?

You can do this in Base SAS (DATA step + PROC MEANS) or in SAS/IML. By coincidence, those are the same percentages I use in my book *Simulating Data with SAS*.

To do the experiment one time, look at the article "Simulate categorical data in SAS".

To write this as a Monte Carlo simulation, look at the article "Simulation in SAS: The slow way or the BY way"

Anyone who has a copy of my book can look on p. 130-131 for an interesting twist to this problem.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 11:23 AM

Not homework, for actual work. I guess I'm not following on how to create multiple sets. I see how to create one set. Can you elaborate?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 01:16 PM

A stub of an example;

Data one two three;

/* get values from somewhere*/

if category=1 then output one;

else if category=2 then output two;

else if category=3 then output three;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 02:30 PM

Sorry for the confusion. Somehow I misread the phrase "I have a list of numbers." You are doing resampling rather than pure simulation.

There is always an issue with questions like this: Do you want EXACTLY 50%, 30%, 20%, or do you want to sample with probability 0.5, 0.3, and 0.2. I assume the latter.

As PG says, the general tool for resampling is PROC SURVEYSELECT. For this simple case you could also do the sampling yourself by assigning the "buckets" in a DATA step:

data test;

keep x;

call streaminit(876857);

do i = 1 to 64;

x = rand("NORMAL");

output;

end;

run;

/* randomly assign numbers to bins; repeat 100 times */

%let NumSamples = 100;

data Many;

set test;

do SampleID = 1 to &NumSamples; /* 1. duplicate sample */

bin = rand("Table", 0.5, 0.3, 0.2); /* assign to bin */

output;

end;

run;

/* prepare for BY-group analysis */

proc sort data=Many;

by SampleID;

run;

/* create "sheet that says the mean of each bucket for each run" */

proc means data=Many noprint;

by SampleID; /* 2. compute many statistics */

class bin;

var x;

output out=OutStats(where=(_TYPE_=1)) mean=SampleMean;

run;

proc print data=OutStats;

var SampleID bin _FREQ_ SampleMean;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 02:41 PM

Oh, I did not mean probability. I meant if my list was 10 then A would have 5 of the numbers, B would have 3 of the numbers, and C would have the last two numbers in it.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 02:53 PM

I'm very sorry for the confusion.

If this was my list

DRG |

2 |

2 |

2 |

3 |

5 |

7 |

13 |

19 |

22 |

54 |

For example one run would have

A | B | C |

2 | 2 | 2 |

3 | 5 | 7 |

13 | 19 | |

22 | ||

54 |

Then I want to know the mean of each and say

Run Average | A | B | C |

Run 1 Average | 18.8 | 8.666667 | 4.5 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 03:00 PM

I think Ricks code is correct, what if you had 20 observations, you'd want 10 in the first group, 6 in the second and 4 in the third. That's what the probability does in his code.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 01:09 PM

Dr. Wicklin's text, "Simulating Data Using SAS", is phenomenal!

I highly recommend it.

Brian

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 01:35 PM

Use **proc surveyselect** with the **GROUPS=()** option :

**/* Test data */**

**data test;**

**call streaminit(876857);**

**do id = 1 to 64;**

** x = rand("NORMAL");**

** output;**

** end;**

**run;**

**/* Calculate the bucket sizes */**

**proc sql;**

**select **

** round(0.5*count(*)) as n50, **

** round(0.3*count(*)) as n30,**

** count(*) - calculated n50 - calculated n30 as n20**

**into :n50, :n30, :n20**

**from test;**

**quit;**

**/* Generate random groups, 100 times */**

**proc surveyselect data=test out=sample rep=100 groups=(&n50 &n30 &n20); run;**

**/* Calculate group means */**

**proc sql;**

**create table stats as**

**select replicate, groupID as bucket, mean(x) as meanX**

**from sample**

**group by replicate, groupID;**

**quit;**

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 02:11 PM

PGStats

The groups option is causing a syntax error. It seems groups is not an option for this proc SAS/STAT(R) 9.2 User's Guide, Second Edition

It is possible it is another option?

Thank you so much.

Solution

02-05-2015
03:18 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 03:18 PM

I'm using SAS 9.4. This would certainly work in 9.2 :

**data test;**

**call streaminit(876857);**

**do id = 1 to 64;**

** x = rand("NORMAL");**

** output;**

** end;**

**run;**

**proc sql;**

**select **

** round(0.5*count(*)) as n50, **

** round(0.3*count(*)) as n30**

**into :n50, :n30**

**from test;**

**quit;**

**data sample;**

**set test;**

**do replicate = 1 to 100;**

** rnd = rand("UNIFORM");**

** output;**

** end;**

**run;**

**proc sort data=sample; by replicate rnd; run;**

**data sampleGroups;**

**set sample; by replicate;**

**if first.replicate then n = 0;**

**n + 1;**

**if n <= &n50 then bucket = "A";**

**else if n <= &n50 + &n30 then bucket = "B";**

**else bucket = "C";**

**drop n rnd;**

**run;**

**proc sql;**

**create table stats as**

**select replicate, bucket, mean(x) as meanX**

**from sampleGroups**

**group by replicate, bucket;**

**quit;**

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2015 04:06 AM

I am considering using proc rank .

data class; set sashelp.class; ran=ranuni(0); keep ran; run; proc rank data=class out=temp groups=10; var ran; ranks rank; run; data want; merge temp(where=(rank in (0:4)) rename=(ran=A)) temp(where=(rank in (5:7)) rename=(ran=B)) temp(where=(rank in (8 9)) rename=(ran=C)) ; drop rank; run;

Xia Keshan