turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- PROC SURVEYSELECT with Constraints

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-20-2011 12:56 PM

Hi everyone...

I'm trying to randomly select n samples from a larger data set X of various variables (a, b, c). Up to now I've used the SURVESELECT statement, which is basically fine. However, I need to implement a constraint, such that only samples with realizations of c=1 are considered for the drawing. I can only think of using DATA to create a new data set of all samples c=1, on which I use the SURVEYSELECT function.

Do you know a quick and efficient way?

Thank you for your great support!!!!

Accepted Solutions

Solution

11-20-2011
09:48 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MarkGIP

11-20-2011 09:48 PM

Use a where statement within surveyselect, e.g.,

Proc surveyselect data=data1 (where = (c=1)) out=sample method=srs sampsize=50;

run;

or

Proc surveyselect data=data1 out=sample method=srs sampsize=50;

where c=1;

run;

All Replies

Solution

11-20-2011
09:48 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MarkGIP

11-20-2011 09:48 PM

Use a where statement within surveyselect, e.g.,

Proc surveyselect data=data1 (where = (c=1)) out=sample method=srs sampsize=50;

run;

or

Proc surveyselect data=data1 out=sample method=srs sampsize=50;

where c=1;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MarkGIP

11-21-2011 09:59 AM

And if you plan to do this for ALL levels of c, sort by c and use a BY statement in PROC SURVEYSELECT.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

11-21-2011 11:09 AM

Thank you, mfischer! I thought the WHERE statement won't work with PROC SURVEYSELECT:

@Rick: Would you mind explaining the sorting in a little bit more detail. Indeed I'm planning to do this for all levels of c. But why do I need to sort the original data set when I use the WHERE option? And where does the BY statement go in PROC SURVEYSELECT?

THX!!!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MarkGIP

11-21-2011 11:39 AM

BY-group processing is fundamental to SAS programming. It tells a SAS procedure to repeat an analysis for every unique value of the variables listed on the BY statement. The data must first be sorted by the variables that you are listing on the BY statement.

Some documentation is here: http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001278468.htm

Can someone recommend some tutorial-level SUGI papers or other "getting started" resources?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

12-10-2011 07:48 AM

Thank you Rick for all your effort!!!

Just one more question regarding the WHERE option in combination with PROC SURVEYSELECT. Is it possible to combine the WHERE option with values from another data set?

In detail: I'm trying to draw several observations from a data set X which fulfill certain conditions, which are specified in the WHERE option. One of the conditions is that none of the drawn values is equal to the values of a different data set Y. Since the values of Y are not static It would be great if the where option could take the specific values of Y automatically.

Example:

x= [1 2 ....... 100]

y= [2 55 83 99]

Draw 10 observations of X. Exclude the values of Y.

If tried it with the exact location of the other data sat (i.e.: proc surveyselect data=work.x (where=(X<>work.Y))) but it won't accept it

Thank you for your great support!!!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MarkGIP

12-10-2011 12:06 PM

You can just add a step to generate the list of values to exclude. One way is to create a macro variable with the list.

* Setup some data for the example;

data y;

do y=2,55,83,99; output; end;

run;

data x;

do x=1 to 100; output; end;

run;

* Create the list into a macro variable ;

proc sql noprint ;

select distinct y into :ylist separated by ' '

from y

;

quit;

* Use the list in a WHERE dataset option ;

proc surveyselect data=x (where=(x not in (&ylist))) ....

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-10-2011 07:12 PM

Hi Tom!

Thank you for your quick help.

I have to admit that I have no experience with macro variables. I've tried to implement your code in SAS. However, I don't really get it to work. When I ran the code with a sample size of 96 the PROC SURVEYSELECT data set "test" contained also the excluded values of y.

Here is the exact code:

** **

data y;

do y=

,2,55,83; output; end99

run;

data x;do x=

to1; output; end100

run;

proc sql noprint;select distinct y into :ylist separated by '' from y;

quit;

procdata=x (where=(x not in (&ylist)))method=SRS sampsize=surveyselectseed=96out=test;0

run;

Thank you so much!!!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MarkGIP

12-10-2011 07:48 PM

please change "separated by ' '" to "separated by ',' "

data y;

do y=2,55,83,99; output; end;

run;

data x;

do x=1 to 100; output; end;

run;

proc sql noprint;

select distinct y into :ylist separated by ','

from y;

quit;

proc surveyselect data=x (where=(x not in (&ylist)))method=SRS sampsize=96 seed=0 out=test;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Linlin

12-10-2011 09:02 PM

You do not need to use a comma, you can use either a comma or spaces to separate lists of numbers.

The problem with Mark's code is that there is no space inside the quotes so the value turns into one really long string.

91 %put &ylist;

2558399

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-10-2011 09:47 PM

Thank you Tom!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-11-2011 04:45 AM

Silly me.....

Thank you for your help, Tom!!!