Thursday
I would be very happy to help.
First, note that my DATA is registered to every student in a classroom within a school.
My goal is to randomly sample K out of N schools in distict school
And in each out of the K school, I want to sample to each level of the sample stratum1, a different number of students
When at 2 levels of the stratum1 we will randomly sample 4 students and at the third level we will sample 5 students.
I think to split the sample so that:
1- Samples K distinct schools from N schools:
proc sort data=my_data ; by school;run;
data my_data ;set my_data ;
by school;
/*Placing a random number for school and for students*/
retain uSchool 0;
if first.school then uschool=ranuni(12222154);
ustudent=ranuni(8744);
run;
proc sql;
ctreate table distinct_schools as
select distinict school,uschool,stratum1, stratum2 , stratum3, stratum4
from my_data;
quit;
proc sort data=distinct_schools ; by uschool;run;
proc surveyselect data=distinct_schools out=K_schools outsorting=k_schools_sort
method=srs sampsize=60 seed=1234;
control stratum2 stratum3 stratum4;
run;
2- and perform another sample, sample 4 students from the 2 levels of stratum1 and at the third level we will sample 5 students.
of 4 ostudents from the K schools for the two tiers separately.
proc sql;
create table Students_frame as
select a.school,b.*
from distinct_schools as a left join my_data as b
on a.school=b.school;
quit;
proc sort data=Students_frame; by uSchool ustudent stratum1;run;
proc surveyselect data=Students_frameout=Students_sample outsorting=sort_Students_sample
method=srs sampsize=( 4 4 5) seed=323161;
strata uschool stratum1;
control stratum2 stratum3 stratum4;
run;
Do you have another suggestion that you do the sampling in one stroke and not split in two?
I would consider creating a data set with my desired sample size and proportion and pass that to SurveySelect.
The first pass, selecting K/N schools can happen within the data step that creates the proportions/sizes so it will reduce the process to two/three steps overall assuming you need to sort.
@Ash_75 wrote:
sampiling studetns from K out of N schools by 3 stratum leves?Thursday
I would be very happy to help.
First, note that my DATA is registered to every student in a classroom within a school.
My goal is to randomly sample K out of N schools in distict school
And in each out of the K school, I want to sample to each level of the sample stratum1, a different number of students
When at 2 levels of the stratum1 we will randomly sample 4 students and at the third level we will sample 5 students.
I think to split the sample so that:
1- Samples K distinct schools from N schools:
proc sort data=my_data ; by school;run;
data my_data ;set my_data ;
by school;
/*Placing a random number for school and for students*/
retain uSchool 0;
if first.school then uschool=ranuni(12222154);
ustudent=ranuni(8744);
run;proc sql;
ctreate table distinct_schools as
select distinict school,uschool,stratum1, stratum2 , stratum3, stratum4
from my_data;
quit;
proc sort data=distinct_schools ; by uschool;run;
proc surveyselect data=distinct_schools out=K_schools outsorting=k_schools_sort
method=srs sampsize=60 seed=1234;
control stratum2 stratum3 stratum4;
run;
2- and perform another sample, sample 4 students from the 2 levels of stratum1 and at the third level we will sample 5 students.of 4 ostudents from the K schools for the two tiers separately.
proc sql;create table Students_frame as
select a.school,b.*
from distinct_schools as a left join my_data as b
on a.school=b.school;
quit;
proc sort data=Students_frame; by uSchool ustudent stratum1;run;
proc surveyselect data=Students_frameout=Students_sample outsorting=sort_Students_sample
method=srs sampsize=( 4 4 5) seed=323161;
strata uschool stratum1;
control stratum2 stratum3 stratum4;
run;
Do you have another suggestion that you do the sampling in one stroke and not split in two?
One way to select a first-stage sample of schools is to use the CLUSTER statement in PROC SURVEYSELECT.
For example,
proc surveyselect data=my_data out=SchoolSample sampsize=K seed=1234;
cluster school;
run;
It is necessary to invoke PROC SURVEYSELECT separately for each stage of selection. For your example, you can select the first-stage sample of schools from DATA=my_data, and then select the second-stage sample of students from the selected schools (DATA=SchoolSample).
It looks like your SURVEYSELECT code includes a CONTROL statement together with METHOD=SRS. This will produce an error because CONTROL sorting applies only to systematic and sequential selection methods.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.