Thursday
I would be very happy to help.
First, note that my DATA is registered to every student in a classroom within a school.
My goal is to randomly sample K out of N schools in distict school
And in each out of the K school, I want to sample to each level of the sample stratum1, a different number of students
When at 2 levels of the stratum1 we will randomly sample 4 students and at the third level we will sample 5 students.
I think to split the sample so that:
1- Samples K distinct schools from N schools:
proc sort data=my_data ; by school;run;
data my_data ;set my_data ;
by school;
/*Placing a random number for school and for students*/
retain uSchool 0;
if first.school then uschool=ranuni(12222154);
ustudent=ranuni(8744);
run;
proc sql;
ctreate table distinct_schools as
select distinict school,uschool,stratum1, stratum2 , stratum3, stratum4
from my_data;
quit;
proc sort data=distinct_schools ; by uschool;run;
proc surveyselect data=distinct_schools out=K_schools outsorting=k_schools_sort
method=srs sampsize=60 seed=1234;
control stratum2 stratum3 stratum4;
run;
2- and perform another sample, sample 4 students from the 2 levels of stratum1 and at the third level we will sample 5 students.
of 4 ostudents from the K schools for the two tiers separately.
proc sql;
create table Students_frame as
select a.school,b.*
from distinct_schools as a left join my_data as b
on a.school=b.school;
quit;
proc sort data=Students_frame; by uSchool ustudent stratum1;run;
proc surveyselect data=Students_frameout=Students_sample outsorting=sort_Students_sample
method=srs sampsize=( 4 4 5) seed=323161;
strata uschool stratum1;
control stratum2 stratum3 stratum4;
run;
Do you have another suggestion that you do the sampling in one stroke and not split in two?
I would consider creating a data set with my desired sample size and proportion and pass that to SurveySelect.
The first pass, selecting K/N schools can happen within the data step that creates the proportions/sizes so it will reduce the process to two/three steps overall assuming you need to sort.
@Ash_75 wrote:
sampiling studetns from K out of N schools by 3 stratum leves?Thursday
I would be very happy to help.
First, note that my DATA is registered to every student in a classroom within a school.
My goal is to randomly sample K out of N schools in distict school
And in each out of the K school, I want to sample to each level of the sample stratum1, a different number of students
When at 2 levels of the stratum1 we will randomly sample 4 students and at the third level we will sample 5 students.
I think to split the sample so that:
1- Samples K distinct schools from N schools:
proc sort data=my_data ; by school;run;
data my_data ;set my_data ;
by school;
/*Placing a random number for school and for students*/
retain uSchool 0;
if first.school then uschool=ranuni(12222154);
ustudent=ranuni(8744);
run;proc sql;
ctreate table distinct_schools as
select distinict school,uschool,stratum1, stratum2 , stratum3, stratum4
from my_data;
quit;
proc sort data=distinct_schools ; by uschool;run;
proc surveyselect data=distinct_schools out=K_schools outsorting=k_schools_sort
method=srs sampsize=60 seed=1234;
control stratum2 stratum3 stratum4;
run;
2- and perform another sample, sample 4 students from the 2 levels of stratum1 and at the third level we will sample 5 students.of 4 ostudents from the K schools for the two tiers separately.
proc sql;create table Students_frame as
select a.school,b.*
from distinct_schools as a left join my_data as b
on a.school=b.school;
quit;
proc sort data=Students_frame; by uSchool ustudent stratum1;run;
proc surveyselect data=Students_frameout=Students_sample outsorting=sort_Students_sample
method=srs sampsize=( 4 4 5) seed=323161;
strata uschool stratum1;
control stratum2 stratum3 stratum4;
run;
Do you have another suggestion that you do the sampling in one stroke and not split in two?
One way to select a first-stage sample of schools is to use the CLUSTER statement in PROC SURVEYSELECT.
For example,
proc surveyselect data=my_data out=SchoolSample sampsize=K seed=1234;
cluster school;
run;
It is necessary to invoke PROC SURVEYSELECT separately for each stage of selection. For your example, you can select the first-stage sample of schools from DATA=my_data, and then select the second-stage sample of students from the selected schools (DATA=SchoolSample).
It looks like your SURVEYSELECT code includes a CONTROL statement together with METHOD=SRS. This will produce an error because CONTROL sorting applies only to systematic and sequential selection methods.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.