Hi,
I wrote the below code to select 10 samples randomly based on “class” distribution.
data list;
input Class $ Name $ 3-11 Marks ;
cards;
A student1 100
A student2 100
A student3 90
A student4 80
A student5 70
A student6 60
A student7 50
A student8 40
A student9 30
A student10 20
B student11 100
B student12 90
B student13 90
B student14 80
B student15 70
B student16 60
C student17 100
C student18 100
C student19 100
C student20 50
run;
/* select random samples based on the proportion */
proc surveyselect data = list out = list_sample method = srs sampsize=10 seed = 9876;
strata class / alloc=proportional;
run;
proc freq data=list_sample;
table class;
run;
For the output samples, only 1 100 marks was selected in each class (totally 100 marks cases=3)
But now I want to add one more requirement:
All marks=100(totally 6 cases here) should be included in the samples.
How can I ensure this? Then I tried usually certainty sampling with the below code
data list2;
set list;
if marks=100 then ranking=3;
else ranking=1;
run;
proc surveyselect data = list2 out = list2_sample method = pps certsize=2 sampsize=10 seed = 9876;
strata class / alloc=proportional;
size ranking;
run;
proc freq data=list2_sample;
table class;
run;
All 100 marks cases in class A and B were selected, but there is error for class C:
ERROR: The number of certainty units exceeds the specified sample size.
My idea is, if number of certainty units exceeds the specified sample size, then just randomly choose among those certainty units to meet the sample size. But I don't know how to fix this problem.
My ultimate result is to select base on proportion to the number of observation (that's why I use proportional allocation), and to select all marks=100 with first priority, the remaining to be selected randomly from the pool.
Any better idea to solve the problem? Many thanks for the help!
... View more