DATA Step, Macro, Functions and more

Selecting multiple random samples

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Selecting multiple random samples

Hi,

I have a dataset that has a unique ID and 10 sources in which they come from, it looks like:

ID Source

1 A

2 B

3 C

4 D

5 D

6 C

etc.

What i'd like to do is create a new table selecting at Random a fixed number of observations from each source, for example my criteria may look like:

Source Random_Selection_count

A 500

B 300

C 250

D 100

etc.

I figure I can run the proc sql selection for each of the 10 sources then appending them together, but I am wondering if there is a simplier way?

Thanks in advance for your help.


Accepted Solutions
Solution
‎12-03-2013 07:33 AM
Respected Advisor
Posts: 3,799

Re: Selecting multiple random samples

SURVEYSELECT.

data source;
   input ID Source :$1.;
  
cards;
1 A
2 B
3 C
4 D
5 D
6 C
;;;;
   run;
proc sort;
  
by source;
   run;
proc print;
  
run;
data size;
   input source :$1. _nsize_;
   cards;
A 1
B 2
C 1
D 2
;;;;
   run;
proc print;
  
run;
proc surveyselect n=size data=source out=sample outall selectall;
  
strata source;
   run;
proc print;
  
run;

View solution in original post


All Replies
Solution
‎12-03-2013 07:33 AM
Respected Advisor
Posts: 3,799

Re: Selecting multiple random samples

SURVEYSELECT.

data source;
   input ID Source :$1.;
  
cards;
1 A
2 B
3 C
4 D
5 D
6 C
;;;;
   run;
proc sort;
  
by source;
   run;
proc print;
  
run;
data size;
   input source :$1. _nsize_;
   cards;
A 1
B 2
C 1
D 2
;;;;
   run;
proc print;
  
run;
proc surveyselect n=size data=source out=sample outall selectall;
  
strata source;
   run;
proc print;
  
run;
Frequent Contributor
Posts: 81

Re: Selecting multiple random samples

Posted in reply to data_null__

amazing, thank you very much.

Respected Advisor
Posts: 3,799

Re: Selecting multiple random samples

You may not want OUTALL.  SELECTALL selects the entire strata when _NSIZE_ is bigger than the number of obs.  use REPS options if you want more than one sample and you probably want to specify the SEED.

Frequent Contributor
Posts: 81

Re: Selecting multiple random samples

Posted in reply to data_null__

Hi,

I'm revisiting this test, with a additional layer of complexity that i'm unsure about how to proceed.

I have a testing grid that look's like:

GroupZABCDEFGHIJTotal per Group
Group 1          24,100           991           991           991       1,011 28,084
Group 2          24,099           991           991           991       1,010       1,295 1,322 1,322 1,321 1,321 1,322 35,985
Group 3          24,100           991           991           991       1,011       1,295 1,321 1,321 1,322 1,321 1,321 35,985
Group 4          24,100           991           991           991       1,011       1,295 1,322 1,321 1,321 1,322 1,321 35,986
Total          96,399       3,964       3,964       3,964       4,043       3,885 3,965 3,964 3,964 3,964 3,964 136,040

Where I have 4 Groups comprised of a sample from the following Sources (A-Z).

I'm not sure how I can assign a random sample for each group across the different Sources.

Your help greatly appreciated, thank you.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 266 views
  • 0 likes
  • 2 in conversation