## Selecting multiple random samples

Solved
Frequent Contributor
Posts: 81

# Selecting multiple random samples

Hi,

I have a dataset that has a unique ID and 10 sources in which they come from, it looks like:

ID Source

1 A

2 B

3 C

4 D

5 D

6 C

etc.

What i'd like to do is create a new table selecting at Random a fixed number of observations from each source, for example my criteria may look like:

Source Random_Selection_count

A 500

B 300

C 250

D 100

etc.

I figure I can run the proc sql selection for each of the 10 sources then appending them together, but I am wondering if there is a simplier way?

Accepted Solutions
Solution
‎12-03-2013 07:33 AM
Posts: 3,852

## Re: Selecting multiple random samples

SURVEYSELECT.

data source;
input ID Source :\$1.;

cards;
1 A
2 B
3 C
4 D
5 D
6 C
;;;;
run;
proc sort;

by source;
run;
proc print;

run;
data size;
input source :\$1. _nsize_;
cards;
A 1
B 2
C 1
D 2
;;;;
run;
proc print;

run;
proc surveyselect n=size data=source out=sample outall selectall;

strata source;
run;
proc print;

run;

All Replies
Solution
‎12-03-2013 07:33 AM
Posts: 3,852

## Re: Selecting multiple random samples

SURVEYSELECT.

data source;
input ID Source :\$1.;

cards;
1 A
2 B
3 C
4 D
5 D
6 C
;;;;
run;
proc sort;

by source;
run;
proc print;

run;
data size;
input source :\$1. _nsize_;
cards;
A 1
B 2
C 1
D 2
;;;;
run;
proc print;

run;
proc surveyselect n=size data=source out=sample outall selectall;

strata source;
run;
proc print;

run;
Frequent Contributor
Posts: 81

## Re: Selecting multiple random samples

amazing, thank you very much.

Posts: 3,852

## Re: Selecting multiple random samples

You may not want OUTALL.  SELECTALL selects the entire strata when _NSIZE_ is bigger than the number of obs.  use REPS options if you want more than one sample and you probably want to specify the SEED.

Frequent Contributor
Posts: 81

## Re: Selecting multiple random samples

Hi,

I'm revisiting this test, with a additional layer of complexity that i'm unsure about how to proceed.

I have a testing grid that look's like:

 Group Z A B C D E F G H I J Total per Group Group 1 24,100 991 991 991 1,011 28,084 Group 2 24,099 991 991 991 1,010 1,295 1,322 1,322 1,321 1,321 1,322 35,985 Group 3 24,100 991 991 991 1,011 1,295 1,321 1,321 1,322 1,321 1,321 35,985 Group 4 24,100 991 991 991 1,011 1,295 1,322 1,321 1,321 1,322 1,321 35,986 Total 96,399 3,964 3,964 3,964 4,043 3,885 3,965 3,964 3,964 3,964 3,964 136,040

Where I have 4 Groups comprised of a sample from the following Sources (A-Z).

I'm not sure how I can assign a random sample for each group across the different Sources.

Your help greatly appreciated, thank you.

🔒 This topic is solved and locked.