BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
weweaw
Obsidian | Level 7

This is the problem, I have a list of numbers and I want to randomly put them into three buckets.  This will be a no replacement.  I want 50% of the numbers to go in bucket A, 30% to go into bucket B, and 20% to go in bucket C.  From this I want to know the mean of each bucket, then run it 100 times and have a sheet that says the mean of each bucket for each run.  I have no idea where to start.  Please help. 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I'm using SAS 9.4. This would certainly work in 9.2 :

data test;

call streaminit(876857);

do id = 1 to 64;

    x = rand("NORMAL");

    output;

    end;

run;

proc sql;

select

    round(0.5*count(*)) as n50,

    round(0.3*count(*)) as n30

into :n50, :n30

from test;

quit;

data sample;

set test;

do replicate = 1 to 100;

    rnd = rand("UNIFORM");

    output;

    end;

run;

proc sort data=sample; by replicate rnd; run;

data sampleGroups;

set sample; by replicate;

if first.replicate then n = 0;

n + 1;

if n <= &n50 then bucket = "A";

else if n <= &n50 + &n30 then bucket = "B";

else bucket = "C";

drop n rnd;

run;

proc sql;

create table stats as

select replicate, bucket, mean(x) as meanX

from sampleGroups

group by replicate, bucket;

quit;

PG

PG

View solution in original post

12 REPLIES 12
Rick_SAS
SAS Super FREQ

Homework?

You can do this in Base SAS (DATA step + PROC MEANS) or in SAS/IML. By coincidence, those are the same percentages I use in my book Simulating Data with SAS.

To do the experiment one time, look at the article "Simulate categorical data in SAS".

To write this as a Monte Carlo simulation, look at the article "Simulation in SAS: The slow way or the BY way"

Anyone who has a copy of my book can look on p. 130-131 for an interesting twist to this problem.

weweaw
Obsidian | Level 7

Not homework, for actual work.  I guess I'm not following on how to create multiple sets.  I see how to create one set.  Can you elaborate? 

ballardw
Super User

A stub of an example;

Data one two three;

/* get values from somewhere*/

if category=1 then output one;

else if category=2 then output two;

else if category=3 then output three;

run;

Rick_SAS
SAS Super FREQ

Sorry for the confusion. Somehow I misread the phrase "I have a list of numbers."  You are doing resampling rather than pure simulation.

There is always an issue with questions like this: Do you want EXACTLY 50%, 30%, 20%, or do you want to sample with probability 0.5, 0.3, and 0.2.  I assume the latter.

As PG says, the general tool for resampling is PROC SURVEYSELECT. For this simple case you could also do the sampling yourself by assigning the "buckets" in a DATA step:

data test;
keep x;
call streaminit(876857);
do i = 1 to 64;
   x = rand("NORMAL");
   output;
   end;
run;

/* randomly assign numbers to bins; repeat 100 times */
%let NumSamples = 100;
data Many;
set test;
do SampleID = 1 to &NumSamples;  /* 1. duplicate sample */
   bin = rand("Table", 0.5, 0.3, 0.2); /* assign to bin */
   output;
end;
run;

/* prepare for BY-group analysis */
proc sort data=Many;
by SampleID;
run;

/* create "sheet that says the mean of each bucket for each run" */
proc means data=Many noprint;
   by SampleID;                  /* 2. compute many statistics */
   class bin;
   var x;
   output out=OutStats(where=(_TYPE_=1)) mean=SampleMean;
run;

proc print data=OutStats;
var SampleID bin _FREQ_ SampleMean;
run;

weweaw
Obsidian | Level 7

Oh, I did not mean probability.  I meant if my list was 10 then A would have 5 of the numbers, B would have 3 of the numbers, and C would have the last two numbers in it.

weweaw
Obsidian | Level 7

I'm very sorry for the confusion.

If this was my list

DRG
2
2
2
3
5
7
13
19
22

54

For example one run would have

ABC
222
357
1319
22
54

Then I want to know the mean of each and say

Run AverageABC
Run 1 Average18.88.6666674.5
Reeza
Super User

I think Ricks code is correct, what if you had 20 observations, you'd want 10 in the first group, 6 in the second and 4 in the third. That's what the probability does in his code.

bhfield
Calcite | Level 5

Dr. Wicklin's text, "Simulating Data Using SAS", is phenomenal!

I highly recommend it.


Brian

PGStats
Opal | Level 21

Use proc surveyselect with the GROUPS=() option :

/* Test data */

data test;

call streaminit(876857);

do id = 1 to 64;

    x = rand("NORMAL");

    output;

    end;

run;

/* Calculate the bucket sizes */

proc sql;

select

    round(0.5*count(*)) as n50,

    round(0.3*count(*)) as n30,

    count(*) - calculated n50 - calculated n30 as n20

into :n50, :n30, :n20

from test;

quit;

/* Generate random groups, 100 times */

proc surveyselect data=test out=sample rep=100 groups=(&n50 &n30 &n20); run;

/* Calculate group means */

proc sql;

create table stats as

select replicate, groupID as bucket, mean(x) as meanX

from sample

group by replicate, groupID;

quit;

PG

PG
weweaw
Obsidian | Level 7

PGStats

The groups option is causing a syntax error.  It seems groups is not an option for this proc SAS/STAT(R) 9.2 User's Guide, Second Edition

It is possible it is another option?

Thank you so much.

PGStats
Opal | Level 21

I'm using SAS 9.4. This would certainly work in 9.2 :

data test;

call streaminit(876857);

do id = 1 to 64;

    x = rand("NORMAL");

    output;

    end;

run;

proc sql;

select

    round(0.5*count(*)) as n50,

    round(0.3*count(*)) as n30

into :n50, :n30

from test;

quit;

data sample;

set test;

do replicate = 1 to 100;

    rnd = rand("UNIFORM");

    output;

    end;

run;

proc sort data=sample; by replicate rnd; run;

data sampleGroups;

set sample; by replicate;

if first.replicate then n = 0;

n + 1;

if n <= &n50 then bucket = "A";

else if n <= &n50 + &n30 then bucket = "B";

else bucket = "C";

drop n rnd;

run;

proc sql;

create table stats as

select replicate, bucket, mean(x) as meanX

from sampleGroups

group by replicate, bucket;

quit;

PG

PG
Ksharp
Super User

I am considering using proc rank .Smiley Happy

data class;
 set sashelp.class;
 ran=ranuni(0);
 keep ran;
run;
proc rank data=class out=temp groups=10;
 var ran;
 ranks rank;
run;
data want;
 merge temp(where=(rank in (0:4)) rename=(ran=A))
       temp(where=(rank in (5:7)) rename=(ran=B))
        temp(where=(rank in (8 9)) rename=(ran=C)) ;
 drop rank;
run;

Xia Keshan

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2752 views
  • 1 like
  • 7 in conversation