BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MarkGIP
Calcite | Level 5

Hi everyone...

I'm trying to randomly select n samples from a larger data set X of various variables (a, b, c). Up to now I've used the SURVESELECT statement, which is basically fine. However, I need to implement a constraint, such that only samples with realizations of c=1 are considered for the drawing. I can only think of using DATA to create a new data set of all samples c=1, on which I use the SURVEYSELECT function.

Do you know a quick and efficient way?

Thank you for your great support!!!!

1 ACCEPTED SOLUTION

Accepted Solutions
mfisher
Fluorite | Level 6

Use a where statement within surveyselect, e.g.,

Proc surveyselect data=data1 (where = (c=1)) out=sample method=srs sampsize=50;

run;

or

Proc surveyselect data=data1 out=sample method=srs sampsize=50;

where c=1;

run;

View solution in original post

11 REPLIES 11
mfisher
Fluorite | Level 6

Use a where statement within surveyselect, e.g.,

Proc surveyselect data=data1 (where = (c=1)) out=sample method=srs sampsize=50;

run;

or

Proc surveyselect data=data1 out=sample method=srs sampsize=50;

where c=1;

run;

Rick_SAS
SAS Super FREQ

And if you plan to do this for ALL levels of c, sort by c and use a BY statement in PROC SURVEYSELECT.

MarkGIP
Calcite | Level 5

Thank you, mfischer! I thought the WHERE statement won't work with PROC SURVEYSELECT:

@Rick: Would you mind explaining the sorting in a little bit more detail. Indeed I'm planning to do this for all levels of c. But why do I need to sort the original data set when I use the WHERE option? And where does the BY statement go in PROC SURVEYSELECT?

THX!!!

Rick_SAS
SAS Super FREQ

BY-group processing is fundamental to SAS programming.  It tells a SAS procedure to repeat an analysis for every unique value of the variables listed on the BY statement. The data must first be sorted by the variables that you are listing on the BY statement.

Some documentation is here: http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001278468.htm

Can someone recommend some tutorial-level SUGI papers or other "getting started" resources?

MarkGIP
Calcite | Level 5

Thank you Rick for all your effort!!!

Just one more question regarding the WHERE option in combination with PROC SURVEYSELECT. Is it possible to combine the WHERE option with values from another data set?

In detail: I'm trying to draw several observations from a data set X which fulfill certain conditions, which are specified in the WHERE option. One of the conditions is that none of the drawn values is equal to the values of a different data set Y. Since the values of Y are not static It would be great if the where option could take the specific values of Y automatically.

Example:

x= [1 2 ....... 100]

y= [2 55 83 99]

Draw 10 observations of X. Exclude the values of Y.

If tried it with the exact location of the other data sat (i.e.: proc surveyselect data=work.x (where=(X<>work.Y))) but it won't accept it:(

Thank you for your great support!!!

Tom
Super User Tom
Super User

You can just add a step to generate the list of values to exclude. One way is to create a macro variable with the list.

* Setup some data for the example;

data y;

   do y=2,55,83,99; output; end;

run;

data x;

   do x=1 to 100; output; end;

run;

* Create the list into a macro variable ;

proc sql noprint ;

  select distinct y into :ylist separated by ' '

from y

  ;

quit;

* Use the list in a WHERE dataset option ;

proc surveyselect data=x (where=(x not in (&ylist))) ....

MarkGIP
Calcite | Level 5

Hi Tom!

Thank you for your quick help.

I have to admit that I have no experience with macro variables. I've tried to implement your code in SAS. However, I don't really get it to work. When I ran the code with a sample size of 96 the PROC SURVEYSELECT data set "test" contained also the excluded values of y.

Here is the exact code:

data y;

do y=2,55,83,99; output; end

run;

data x;

do x=1 to 100; output; end

run;

proc sql noprint;

select distinct y into :ylist separated by '' from y;

quit;

proc surveyselect data=x (where=(x not in (&ylist)))method=SRS sampsize=96 seed=0 out=test;

run;

Thank you so much!!!

Linlin
Lapis Lazuli | Level 10

please change "separated by ' '" to "separated by ',' "

data y;

do y=2,55,83,99; output; end;

run;

data x;

do x=1 to 100; output; end;

run;

proc sql noprint;

select distinct y into :ylist separated by ','

from y;

quit;

proc surveyselect data=x (where=(x not in (&ylist)))method=SRS sampsize=96 seed=0 out=test;

run;

Tom
Super User Tom
Super User

You do not need to use a comma, you can use either a comma or spaces to separate lists of numbers.

The problem with Mark's code is that there is no space inside the quotes so the value turns into one really long string.

91   %put &ylist;

2558399

Linlin
Lapis Lazuli | Level 10

Thank you Tom!

MarkGIP
Calcite | Level 5

Silly me.....

Thank you for your help, Tom!!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 10438 views
  • 1 like
  • 5 in conversation