BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
weweaw
Obsidian | Level 7

This is the problem, I have a list of numbers and I want to randomly put them into three buckets.  This will be a no replacement.  I want 50% of the numbers to go in bucket A, 30% to go into bucket B, and 20% to go in bucket C.  From this I want to know the mean of each bucket, then run it 100 times and have a sheet that says the mean of each bucket for each run.  I have no idea where to start.  Please help. 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I'm using SAS 9.4. This would certainly work in 9.2 :

data test;

call streaminit(876857);

do id = 1 to 64;

    x = rand("NORMAL");

    output;

    end;

run;

proc sql;

select

    round(0.5*count(*)) as n50,

    round(0.3*count(*)) as n30

into :n50, :n30

from test;

quit;

data sample;

set test;

do replicate = 1 to 100;

    rnd = rand("UNIFORM");

    output;

    end;

run;

proc sort data=sample; by replicate rnd; run;

data sampleGroups;

set sample; by replicate;

if first.replicate then n = 0;

n + 1;

if n <= &n50 then bucket = "A";

else if n <= &n50 + &n30 then bucket = "B";

else bucket = "C";

drop n rnd;

run;

proc sql;

create table stats as

select replicate, bucket, mean(x) as meanX

from sampleGroups

group by replicate, bucket;

quit;

PG

PG

View solution in original post

12 REPLIES 12
Rick_SAS
SAS Super FREQ

Homework?

You can do this in Base SAS (DATA step + PROC MEANS) or in SAS/IML. By coincidence, those are the same percentages I use in my book Simulating Data with SAS.

To do the experiment one time, look at the article "Simulate categorical data in SAS".

To write this as a Monte Carlo simulation, look at the article "Simulation in SAS: The slow way or the BY way"

Anyone who has a copy of my book can look on p. 130-131 for an interesting twist to this problem.

weweaw
Obsidian | Level 7

Not homework, for actual work.  I guess I'm not following on how to create multiple sets.  I see how to create one set.  Can you elaborate? 

ballardw
Super User

A stub of an example;

Data one two three;

/* get values from somewhere*/

if category=1 then output one;

else if category=2 then output two;

else if category=3 then output three;

run;

Rick_SAS
SAS Super FREQ

Sorry for the confusion. Somehow I misread the phrase "I have a list of numbers."  You are doing resampling rather than pure simulation.

There is always an issue with questions like this: Do you want EXACTLY 50%, 30%, 20%, or do you want to sample with probability 0.5, 0.3, and 0.2.  I assume the latter.

As PG says, the general tool for resampling is PROC SURVEYSELECT. For this simple case you could also do the sampling yourself by assigning the "buckets" in a DATA step:

data test;
keep x;
call streaminit(876857);
do i = 1 to 64;
   x = rand("NORMAL");
   output;
   end;
run;

/* randomly assign numbers to bins; repeat 100 times */
%let NumSamples = 100;
data Many;
set test;
do SampleID = 1 to &NumSamples;  /* 1. duplicate sample */
   bin = rand("Table", 0.5, 0.3, 0.2); /* assign to bin */
   output;
end;
run;

/* prepare for BY-group analysis */
proc sort data=Many;
by SampleID;
run;

/* create "sheet that says the mean of each bucket for each run" */
proc means data=Many noprint;
   by SampleID;                  /* 2. compute many statistics */
   class bin;
   var x;
   output out=OutStats(where=(_TYPE_=1)) mean=SampleMean;
run;

proc print data=OutStats;
var SampleID bin _FREQ_ SampleMean;
run;

weweaw
Obsidian | Level 7

Oh, I did not mean probability.  I meant if my list was 10 then A would have 5 of the numbers, B would have 3 of the numbers, and C would have the last two numbers in it.

weweaw
Obsidian | Level 7

I'm very sorry for the confusion.

If this was my list

DRG
2
2
2
3
5
7
13
19
22

54

For example one run would have

ABC
222
357
1319
22
54

Then I want to know the mean of each and say

Run AverageABC
Run 1 Average18.88.6666674.5
Reeza
Super User

I think Ricks code is correct, what if you had 20 observations, you'd want 10 in the first group, 6 in the second and 4 in the third. That's what the probability does in his code.

bhfield
Calcite | Level 5

Dr. Wicklin's text, "Simulating Data Using SAS", is phenomenal!

I highly recommend it.


Brian

PGStats
Opal | Level 21

Use proc surveyselect with the GROUPS=() option :

/* Test data */

data test;

call streaminit(876857);

do id = 1 to 64;

    x = rand("NORMAL");

    output;

    end;

run;

/* Calculate the bucket sizes */

proc sql;

select

    round(0.5*count(*)) as n50,

    round(0.3*count(*)) as n30,

    count(*) - calculated n50 - calculated n30 as n20

into :n50, :n30, :n20

from test;

quit;

/* Generate random groups, 100 times */

proc surveyselect data=test out=sample rep=100 groups=(&n50 &n30 &n20); run;

/* Calculate group means */

proc sql;

create table stats as

select replicate, groupID as bucket, mean(x) as meanX

from sample

group by replicate, groupID;

quit;

PG

PG
weweaw
Obsidian | Level 7

PGStats

The groups option is causing a syntax error.  It seems groups is not an option for this proc SAS/STAT(R) 9.2 User's Guide, Second Edition

It is possible it is another option?

Thank you so much.

PGStats
Opal | Level 21

I'm using SAS 9.4. This would certainly work in 9.2 :

data test;

call streaminit(876857);

do id = 1 to 64;

    x = rand("NORMAL");

    output;

    end;

run;

proc sql;

select

    round(0.5*count(*)) as n50,

    round(0.3*count(*)) as n30

into :n50, :n30

from test;

quit;

data sample;

set test;

do replicate = 1 to 100;

    rnd = rand("UNIFORM");

    output;

    end;

run;

proc sort data=sample; by replicate rnd; run;

data sampleGroups;

set sample; by replicate;

if first.replicate then n = 0;

n + 1;

if n <= &n50 then bucket = "A";

else if n <= &n50 + &n30 then bucket = "B";

else bucket = "C";

drop n rnd;

run;

proc sql;

create table stats as

select replicate, bucket, mean(x) as meanX

from sampleGroups

group by replicate, bucket;

quit;

PG

PG
Ksharp
Super User

I am considering using proc rank .Smiley Happy

data class;
 set sashelp.class;
 ran=ranuni(0);
 keep ran;
run;
proc rank data=class out=temp groups=10;
 var ran;
 ranks rank;
run;
data want;
 merge temp(where=(rank in (0:4)) rename=(ran=A))
       temp(where=(rank in (5:7)) rename=(ran=B))
        temp(where=(rank in (8 9)) rename=(ran=C)) ;
 drop rank;
run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2023 views
  • 1 like
  • 7 in conversation