BookmarkSubscribeRSS Feed
jenny_li
Calcite | Level 5

Hi,

I wrote the below code to select 10 samples randomly based on “class” distribution.

 

data list;

input Class $ Name $ 3-11 Marks ;

cards;

A student1  100

A student2  100

A student3  90

A student4  80

A student5  70

A student6  60

A student7  50

A student8  40

A student9  30

A student10 20

B student11 100

B student12 90

B student13 90

B student14 80

B student15 70

B student16 60

C student17 100

C student18 100

C student19 100

C student20 50

run;

 

 

/* select random samples based on the proportion */

proc surveyselect data = list out = list_sample method = srs sampsize=10 seed = 9876;

strata class / alloc=proportional;

run;

 

proc freq data=list_sample;

table class;

run;

 

For the output samples, only 1 100 marks was selected in each class (totally 100 marks cases=3)

But now I want to add one more requirement:

All marks=100(totally 6 cases here) should be included in the samples.

How can I ensure this? Then I tried usually certainty sampling with the below code

 

data list2;

set list;

if marks=100 then ranking=3;

else ranking=1;

run;

 

proc surveyselect data = list2 out = list2_sample method = pps certsize=2 sampsize=10 seed = 9876;

strata class / alloc=proportional;

size ranking;

run;

 

proc freq data=list2_sample;

table class;

run;

 

All 100 marks cases in class A and B were selected, but there is error for class C:

 

ERROR: The number of certainty units exceeds the specified sample size.

 

My idea is, if number of certainty units exceeds the specified sample size, then just randomly choose among those certainty units to meet the sample size. But I don't know how to fix this problem.

 

My ultimate result is to select base on proportion to the number of observation (that's why I use proportional allocation), and to select all marks=100 with first priority, the remaining to be selected randomly from the pool.

 

Any better idea to solve the problem? Many thanks for the help!

 

4 REPLIES 4
data_null__
Jade | Level 19

Do you all the 100 + 10 random obs.,or 10-6=4 random obs?

jenny_li
Calcite | Level 5

   sample size based on proportion    no. of 100 marks in population     samples should be taken from    

A:                    5                                                     2                                    2 100 marks + 3 radom 

B:                    3                                                     1                                    1 100 marks + 2 random

C:                    2                                                     3                                   random select 2 100 marks out of the 3

 

Actually certainty sampling works good for A&B, but just I cannot figure out any method to tell SAS to do for C as it is now a error that certainity units exceed stratum sample size. 

Or any better way to solve the problem without using certainty sampling?

PGStats
Opal | Level 21

How about:

 

 
/* Get the proportional allocation */ 
proc surveyselect data = list out = list_alloc method = srs sampsize=10 seed = 9876;
strata class / alloc=proportional nosample;
run;

/* Substract the marks=100 from allocation */
proc sql;
create table list_select as
select 
    class, 
    max(0, SampleSize - (select sum(marks=100) from list where class=a.class)) as sampleSize
from list_alloc as a;
quit;

/* Select remaining samples */
proc surveyselect 
    data = list (where=(marks ne 100)) 
    out = list_sample 
    method = srs sampsize=list_select seed = 9876;
strata class;
run;

/* Join the marks=100 students to the random samples */
proc sql;
create table list_final_sample as
select * from list where marks=100
union all corr
select * from list_sample
order by class, name;
select * from list_final_sample;
quit;
PG
jenny_li
Calcite | Level 5

Thanks for the code.

But in that case, total 11 samples has been selected as all 3 100 marks samples in Class C were selected.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1784 views
  • 0 likes
  • 3 in conversation