Solved: Re: Monte Carlo / Random number

weweaw · Posted 02-05-2015 10:25 AM

This is the problem, I have a list of numbers and I want to randomly put them into three buckets. This will be a no replacement. I want 50% of the numbers to go in bucket A, 30% to go into bucket B, and 20% to go in bucket C. From this I want to know the mean of each bucket, then run it 100 times and have a sheet that says the mean of each bucket for each run. I have no idea where to start. Please help.

Thank you!

PGStats · Posted 02-05-2015 03:18 PM

I'm using SAS 9.4. This would certainly work in 9.2 :

data test;

call streaminit(876857);

do id = 1 to 64;

x = rand("NORMAL");

output;

end;

run;

proc sql;

select

round(0.5*count(*)) as n50,

round(0.3*count(*)) as n30

into :n50, :n30

from test;

quit;

data sample;

set test;

do replicate = 1 to 100;

rnd = rand("UNIFORM");

output;

end;

run;

proc sort data=sample; by replicate rnd; run;

data sampleGroups;

set sample; by replicate;

if first.replicate then n = 0;

n + 1;

if n <= &n50 then bucket = "A";

else if n <= &n50 + &n30 then bucket = "B";

else bucket = "C";

drop n rnd;

run;

proc sql;

create table stats as

select replicate, bucket, mean(x) as meanX

from sampleGroups

group by replicate, bucket;

quit;

PG

View solution in original post

Rick_SAS · Posted 02-05-2015 10:44 AM

Homework?

You can do this in Base SAS (DATA step + PROC MEANS) or in SAS/IML. By coincidence, those are the same percentages I use in my book Simulating Data with SAS.

To do the experiment one time, look at the article "Simulate categorical data in SAS".

To write this as a Monte Carlo simulation, look at the article "Simulation in SAS: The slow way or the BY way"

Anyone who has a copy of my book can look on p. 130-131 for an interesting twist to this problem.

weweaw · Posted 02-05-2015 11:23 AM

Not homework, for actual work. I guess I'm not following on how to create multiple sets. I see how to create one set. Can you elaborate?

ballardw · Posted 02-05-2015 01:16 PM

A stub of an example;

Data one two three;

/* get values from somewhere*/

if category=1 then output one;

else if category=2 then output two;

else if category=3 then output three;

run;

Rick_SAS · Posted 02-05-2015 02:30 PM

Sorry for the confusion. Somehow I misread the phrase "I have a list of numbers." You are doing resampling rather than pure simulation.

There is always an issue with questions like this: Do you want EXACTLY 50%, 30%, 20%, or do you want to sample with probability 0.5, 0.3, and 0.2. I assume the latter.

As PG says, the general tool for resampling is PROC SURVEYSELECT. For this simple case you could also do the sampling yourself by assigning the "buckets" in a DATA step:

data test;
keep x;
call streaminit(876857);
do i = 1 to 64;
   x = rand("NORMAL");
   output;
   end;
run;

/* randomly assign numbers to bins; repeat 100 times */
%let NumSamples = 100;
data Many;
set test;
do SampleID = 1 to &NumSamples; /* 1. duplicate sample */
bin = rand("Table", 0.5, 0.3, 0.2); /* assign to bin */
output;
end;
run;

/* prepare for BY-group analysis */
proc sort data=Many;
by SampleID;
run;

/* create "sheet that says the mean of each bucket for each run" */
proc means data=Many noprint;
   by SampleID;                  /* 2. compute many statistics */
   class bin;
   var x;
   output out=OutStats(where=(_TYPE_=1)) mean=SampleMean;
run;

proc print data=OutStats;
var SampleID bin _FREQ_ SampleMean;
run;

weweaw · Posted 02-05-2015 02:41 PM

Oh, I did not mean probability. I meant if my list was 10 then A would have 5 of the numbers, B would have 3 of the numbers, and C would have the last two numbers in it.

weweaw · Posted 02-05-2015 02:53 PM

I'm very sorry for the confusion.

If this was my list

DRG
2
2
2
3
5
7
13
19
22
54

For example one run would have

A	B	C
2	2	2
3	5	7
13	19
22
54

Then I want to know the mean of each and say

Run Average	A	B	C
Run 1 Average	18.8	8.666667	4.5

Reeza · Posted 02-05-2015 03:00 PM

I think Ricks code is correct, what if you had 20 observations, you'd want 10 in the first group, 6 in the second and 4 in the third. That's what the probability does in his code.

bhfield · Posted 02-05-2015 01:09 PM

Dr. Wicklin's text, "Simulating Data Using SAS", is phenomenal!

I highly recommend it.

Brian

PGStats · Posted 02-05-2015 01:35 PM

Use proc surveyselect with the GROUPS=() option :

/* Test data */

data test;

call streaminit(876857);

do id = 1 to 64;

x = rand("NORMAL");

output;

end;

run;

/* Calculate the bucket sizes */

proc sql;

select

round(0.5*count(*)) as n50,

round(0.3*count(*)) as n30,

count(*) - calculated n50 - calculated n30 as n20

into :n50, :n30, :n20

from test;

quit;

/* Generate random groups, 100 times */

proc surveyselect data=test out=sample rep=100 groups=(&n50 &n30 &n20); run;

/* Calculate group means */

proc sql;

create table stats as

select replicate, groupID as bucket, mean(x) as meanX

from sample

group by replicate, groupID;

quit;

PG

weweaw · Posted 02-05-2015 02:11 PM

PGStats

The groups option is causing a syntax error. It seems groups is not an option for this proc SAS/STAT(R) 9.2 User's Guide, Second Edition

It is possible it is another option?

Thank you so much.

PGStats · Posted 02-05-2015 03:18 PM

I'm using SAS 9.4. This would certainly work in 9.2 :

data test;

call streaminit(876857);

do id = 1 to 64;

x = rand("NORMAL");

output;

end;

run;

proc sql;

select

round(0.5*count(*)) as n50,

round(0.3*count(*)) as n30

into :n50, :n30

from test;

quit;

data sample;

set test;

do replicate = 1 to 100;

rnd = rand("UNIFORM");

output;

end;

run;

proc sort data=sample; by replicate rnd; run;

data sampleGroups;

set sample; by replicate;

if first.replicate then n = 0;

n + 1;

if n <= &n50 then bucket = "A";

else if n <= &n50 + &n30 then bucket = "B";

else bucket = "C";

drop n rnd;

run;

proc sql;

create table stats as

select replicate, bucket, mean(x) as meanX

from sampleGroups

group by replicate, bucket;

quit;

PG

Ksharp · Posted 02-06-2015 04:06 AM

I am considering using proc rank .

data class;
 set sashelp.class;
 ran=ranuni(0);
 keep ran;
run;
proc rank data=class out=temp groups=10;
 var ran;
 ranks rank;
run;
data want;
 merge temp(where=(rank in (0:4)) rename=(ran=A))
       temp(where=(rank in (5:7)) rename=(ran=B))
        temp(where=(rank in (8 9)) rename=(ran=C)) ;
 drop rank;
run;

Xia Keshan

Registration is open

SAS Training: Just a Click Away