Solved
Contributor
Posts: 33

# PROC SURVEYSELECT with Constraints

Hi everyone...

I'm trying to randomly select n samples from a larger data set X of various variables (a, b, c). Up to now I've used the SURVESELECT statement, which is basically fine. However, I need to implement a constraint, such that only samples with realizations of c=1 are considered for the drawing. I can only think of using DATA to create a new data set of all samples c=1, on which I use the SURVEYSELECT function.

Do you know a quick and efficient way?

Thank you for your great support!!!!

Accepted Solutions
Solution
‎11-20-2011 09:48 PM
Contributor
Posts: 58

## Re: PROC SURVEYSELECT with Constraints

Use a where statement within surveyselect, e.g.,

Proc surveyselect data=data1 (where = (c=1)) out=sample method=srs sampsize=50;

run;

or

Proc surveyselect data=data1 out=sample method=srs sampsize=50;

where c=1;

run;

All Replies
Solution
‎11-20-2011 09:48 PM
Contributor
Posts: 58

## Re: PROC SURVEYSELECT with Constraints

Use a where statement within surveyselect, e.g.,

Proc surveyselect data=data1 (where = (c=1)) out=sample method=srs sampsize=50;

run;

or

Proc surveyselect data=data1 out=sample method=srs sampsize=50;

where c=1;

run;

SAS Super FREQ
Posts: 4,247

## Re: PROC SURVEYSELECT with Constraints

And if you plan to do this for ALL levels of c, sort by c and use a BY statement in PROC SURVEYSELECT.

Contributor
Posts: 33

## Re: PROC SURVEYSELECT with Constraints

Thank you, mfischer! I thought the WHERE statement won't work with PROC SURVEYSELECT:

@Rick: Would you mind explaining the sorting in a little bit more detail. Indeed I'm planning to do this for all levels of c. But why do I need to sort the original data set when I use the WHERE option? And where does the BY statement go in PROC SURVEYSELECT?

THX!!!

SAS Super FREQ
Posts: 4,247

## Re: PROC SURVEYSELECT with Constraints

BY-group processing is fundamental to SAS programming.  It tells a SAS procedure to repeat an analysis for every unique value of the variables listed on the BY statement. The data must first be sorted by the variables that you are listing on the BY statement.

Some documentation is here: http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001278468.htm

Can someone recommend some tutorial-level SUGI papers or other "getting started" resources?

Contributor
Posts: 33

## Re: PROC SURVEYSELECT with Constraints

Thank you Rick for all your effort!!!

Just one more question regarding the WHERE option in combination with PROC SURVEYSELECT. Is it possible to combine the WHERE option with values from another data set?

In detail: I'm trying to draw several observations from a data set X which fulfill certain conditions, which are specified in the WHERE option. One of the conditions is that none of the drawn values is equal to the values of a different data set Y. Since the values of Y are not static It would be great if the where option could take the specific values of Y automatically.

Example:

x= [1 2 ....... 100]

y= [2 55 83 99]

Draw 10 observations of X. Exclude the values of Y.

If tried it with the exact location of the other data sat (i.e.: proc surveyselect data=work.x (where=(X<>work.Y))) but it won't accept it

Thank you for your great support!!!

Super User
Posts: 8,127

## Re: PROC SURVEYSELECT with Constraints

You can just add a step to generate the list of values to exclude. One way is to create a macro variable with the list.

`* Setup some data for the example;data y;   do y=2,55,83,99; output; end;run;data x;   do x=1 to 100; output; end;run;* Create the list into a macro variable ;proc sql noprint ;  select distinct y into :ylist separated by ' '  from y  ;quit;* Use the list in a WHERE dataset option ;proc surveyselect data=x (where=(x not in (&ylist))) ....`
Contributor
Posts: 33

## Re: PROC SURVEYSELECT with Constraints

Hi Tom!

Thank you for your quick help.

I have to admit that I have no experience with macro variables. I've tried to implement your code in SAS. However, I don't really get it to work. When I ran the code with a sample size of 96 the PROC SURVEYSELECT data set "test" contained also the excluded values of y.

Here is the exact code:

`data y; do y=2,55,83,99; output; endrun; data x; do x=1 to 100; output; endrun; proc sql noprint; select distinct y into :ylist separated by '' from y;quit;  proc surveyselect data=x (where=(x not in (&ylist)))method=SRS sampsize=96 seed=0 out=test;run;   `

Thank you so much!!!

Super Contributor
Posts: 1,636

## Re: PROC SURVEYSELECT with Constraints

please change "separated by ' '" to "separated by ',' "

data y;

do y=2,55,83,99; output; end;

run;

data x;

do x=1 to 100; output; end;

run;

proc sql noprint;

select distinct y into :ylist separated by ','

from y;

quit;

proc surveyselect data=x (where=(x not in (&ylist)))method=SRS sampsize=96 seed=0 out=test;

run;

Super User
Posts: 8,127

## Re: PROC SURVEYSELECT with Constraints

You do not need to use a comma, you can use either a comma or spaces to separate lists of numbers.

The problem with Mark's code is that there is no space inside the quotes so the value turns into one really long string.

91   %put &ylist;

2558399

Super Contributor
Posts: 1,636

Thank you Tom!

Contributor
Posts: 33

## Re: PROC SURVEYSELECT with Constraints

Silly me.....

Thank you for your help, Tom!!!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 11 replies
• 2975 views
• 1 like
• 5 in conversation