## Sampling without replacement. Create two data sets from a population in ratio 2;1

# Sampling without replacement. Create two data sets from a population in ratio 2;1

I want to sample without replacement two data set that are complement in ratio 2:1 from the data. Data set "A" should be n=17 and the remaining n=9 should be data set "B"

```DATA hsb25;
INPUT id gender \$ race ses schtype \$ prog
DATALINES;
147 f 1 3 pub 1 47  62  53  53  61
108 m 1 2 pub 2 34  33  41  36  36
18 m 3 2 pub 3 50  33  49  44  36
153 m 1 2 pub 3 39  31  40  39  51
50 m 2 2 pub 2 50  59  42  53  61
51 f 2 1 pub 2 42  36  42  31  39
102 m 1 1 pub 1 52  41  51  53  56
57 f 1 2 pub 1 71  65  72  66  56
160 f 1 2 pub 1 55  65  55  50  61
136 m 1 2 pub 1 65  59  70  63  51
88 f 1 1 pub 1 68  60  64  69  66
177 m 1 2 pri 1 55  59  62  58  51
95 m 1 1 pub 1 73  60  71  61  71
144 m 1 1 pub 2 60  65  58  61  66
139 f 1 2 pub 1 68  59  61  55  71
135 f 1 3 pub 1 63  60  65  54  66
191 f 1 1 pri 1 47  52  43  48  61
171 m 1 2 pub 1 60  54  60  55  66
22 m 3 2 pub 3 42  39  39  56  46
47 f 2 3 pub 1 47  46  49  33  41
56 m 1 2 pub 3 55  45  46  58  51
128 m 1 1 pub 1 39  33  38  47  41
36 f 2 3 pub 2 44  49  44  35  51
53 m 2 2 pub 3 34  37  46  39  31
26 f 4 1 pub 1 60  59  62  61  51
;
RUN;```

.  The code below create data "A", but out to I output the remaining and call it "B"?

```proc surveyselect data = hsb25 method = SRS rep = 1
sampsize = 17 seed = 12345 out = A;
id _all_;
run;```

‎09-01-2017 03:59 PM
## Re: Sampling without replacement. Create two data sets from a population in ratio 2;1

If you use the OUTALL option on the proc statement all records in the data set will be output and an additional variable named Selected will be in the data. A value of 1 for selected indicates the records selected and 0 is not selected.

Depending on what you are actually doing next you could split the data using that variable or use Where clauses or By group processing with that data set.

```proc surveyselect data = hsb25 method = SRS outall rep=1
sampsize = 17 seed = 12345 out = A;
id _all_;
run;

data seta setb;
set A;
if selected=1 then output seta;
else output setb;
run;```

I don't think you're actually gaining anything with the REP=1.

BTW your example data only has 25 records so we can't get 17 and 9.

