BookmarkSubscribeRSS Feed
Blandine
Obsidian | Level 7

Hello, I have a population sample of 1832 bridges and want to randomly select 385 bridges. Out of those 385 bridges, I want to make sure that it has at least 16 bridges with the criteria "STRUCTURE_KIND_043A=6". "STRUCTURE_KIND_043A" is a column in my data set "ID19_PSC". I used the function SURVEYSELECT, it works fine but I only have 1 bridge with the above criteria and I don't know how to fix it.

Any idea?

 

/*Randomly select 385 bridges from the sample population of PSC in ID*/

PROC SURVEYSELECT DATA=ID19_PSC OUT=RBI.ID19PSC_sample METHOD=SRS /*SRS=Simple random selection*/
SAMPSIZE=385 SEED=1234567;
RUN;

3 REPLIES 3
pink_poodle
Barite | Level 11
You can put the 043A bridges into separate data set and randomly select 16 of them. Then take the rest of bridges and randomly select 385 minus 16 of them. Then fuse the two resulting datasets using a set statement.
ballardw
Super User

@Blandine wrote:

Hello, I have a population sample of 1832 bridges and want to randomly select 385 bridges. Out of those 385 bridges, I want to make sure that it has at least 16 bridges with the criteria "STRUCTURE_KIND_043A=6". "STRUCTURE_KIND_043A" is a column in my data set "ID19_PSC". I used the function SURVEYSELECT, it works fine but I only have 1 bridge with the above criteria and I don't know how to fix it.

Any idea?

 

/*Randomly select 385 bridges from the sample population of PSC in ID*/

PROC SURVEYSELECT DATA=ID19_PSC OUT=RBI.ID19PSC_sample METHOD=SRS /*SRS=Simple random selection*/
SAMPSIZE=385 SEED=1234567;
RUN;


You might be looking for a STRATA based on the values of the variable. Then you can specify sample sizes per strata. If your want only that specific value to have the specific size a custom format applied to the variable to create the two levels of your variable for strata. You would specify the count per strata.

Data would need to be sorted by the STRATA variable to use the Strata statement.

 

Here is brief example using a data set you should have available to demonstrate the basic approach:

proc sort data=sashelp.class
    out=work.class;
   by age;
run;

proc format;
/* the age variable only takes values
   of 11 to 16 and I want to split into
   two strata later
*/
value agestrata
other = '11 to 14'
15 - 16 = '15+';
run;

/* this code wants to select 6 from the 11 to 14 group
   and 2 from the 15+
*/
proc surveyselect data=work.class out=work.samp
   sampsize=(6 2);
   strata age;
   format age agestrata.;
run;

If your specific value is not the first or the last alphabetically you may need to use the format to add a value to the data set and sort on that new variable, otherwise the procedure will see 3 strata and you would have to provide 3 parameters to the SAMPSIZE option. The values in Sampsize are applied in order to the values of the Strata varaible, so the 6 above applies to the first strata and 2 to the second. This works for the example to get two records for the 15+ age group because they are the second strata value by format in order.

 

 

Blandine
Obsidian | Level 7

The code works and I try to replicate exactly what you show me. My only question is why it did put my variable "STRUCTURE_KIND_043A" in the first column?

It's not a big deal I just wanted to understand what part of the code does it. "State_code_001" used to be in the first column

 

code is:

/*This program will select randomly data from the PSC in ID while having a set number for PSC cont. and PSC*/
proc sort data=ID19_PSC
out=RBI.ID19PSC_sort;
by STRUCTURE_KIND_043A; /*sort by structure type 5=PSC and 6=PSC cont.*/
run;

 

 

Blandine_0-1602281324551.png

 


proc format;
/* the STRUCTURE_KIND_043A variable only takes values of 5 and 6 and I want to split into two strata later */
value STRUCTURE_KIND_043Astrata
other = '5'
6 = '6';
run;
/* this code wants to select 367 from the 5 group and 18 from the 6 */
proc surveyselect data=RBI.ID19PSC_sort out=RBI.ID19PSC_BRsample
sampsize=(367 18);
strata STRUCTURE_KIND_043A;
format STRUCTURE_KIND_043A STRUCTURE_KIND_043Astrata.;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1105 views
  • 8 likes
  • 3 in conversation