@Blandine wrote:
Hello, I have a population sample of 1832 bridges and want to randomly select 385 bridges. Out of those 385 bridges, I want to make sure that it has at least 16 bridges with the criteria "STRUCTURE_KIND_043A=6". "STRUCTURE_KIND_043A" is a column in my data set "ID19_PSC". I used the function SURVEYSELECT, it works fine but I only have 1 bridge with the above criteria and I don't know how to fix it.
Any idea?
/*Randomly select 385 bridges from the sample population of PSC in ID*/
PROC SURVEYSELECT DATA=ID19_PSC OUT=RBI.ID19PSC_sample METHOD=SRS /*SRS=Simple random selection*/
SAMPSIZE=385 SEED=1234567;
RUN;
You might be looking for a STRATA based on the values of the variable. Then you can specify sample sizes per strata. If your want only that specific value to have the specific size a custom format applied to the variable to create the two levels of your variable for strata. You would specify the count per strata.
Data would need to be sorted by the STRATA variable to use the Strata statement.
Here is brief example using a data set you should have available to demonstrate the basic approach:
proc sort data=sashelp.class
out=work.class;
by age;
run;
proc format;
/* the age variable only takes values
of 11 to 16 and I want to split into
two strata later
*/
value agestrata
other = '11 to 14'
15 - 16 = '15+';
run;
/* this code wants to select 6 from the 11 to 14 group
and 2 from the 15+
*/
proc surveyselect data=work.class out=work.samp
sampsize=(6 2);
strata age;
format age agestrata.;
run;
If your specific value is not the first or the last alphabetically you may need to use the format to add a value to the data set and sort on that new variable, otherwise the procedure will see 3 strata and you would have to provide 3 parameters to the SAMPSIZE option. The values in Sampsize are applied in order to the values of the Strata varaible, so the 6 above applies to the first strata and 2 to the second. This works for the example to get two records for the 15+ age group because they are the second strata value by format in order.