Hi everyone...
I'm trying to randomly select n samples from a larger data set X of various variables (a, b, c). Up to now I've used the SURVESELECT statement, which is basically fine. However, I need to implement a constraint, such that only samples with realizations of c=1 are considered for the drawing. I can only think of using DATA to create a new data set of all samples c=1, on which I use the SURVEYSELECT function.
Do you know a quick and efficient way?
Thank you for your great support!!!!
Use a where statement within surveyselect, e.g.,
Proc surveyselect data=data1 (where = (c=1)) out=sample method=srs sampsize=50;
run;
or
Proc surveyselect data=data1 out=sample method=srs sampsize=50;
where c=1;
run;
Use a where statement within surveyselect, e.g.,
Proc surveyselect data=data1 (where = (c=1)) out=sample method=srs sampsize=50;
run;
or
Proc surveyselect data=data1 out=sample method=srs sampsize=50;
where c=1;
run;
And if you plan to do this for ALL levels of c, sort by c and use a BY statement in PROC SURVEYSELECT.
Thank you, mfischer! I thought the WHERE statement won't work with PROC SURVEYSELECT:
@Rick: Would you mind explaining the sorting in a little bit more detail. Indeed I'm planning to do this for all levels of c. But why do I need to sort the original data set when I use the WHERE option? And where does the BY statement go in PROC SURVEYSELECT?
THX!!!
BY-group processing is fundamental to SAS programming. It tells a SAS procedure to repeat an analysis for every unique value of the variables listed on the BY statement. The data must first be sorted by the variables that you are listing on the BY statement.
Some documentation is here: http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001278468.htm
Can someone recommend some tutorial-level SUGI papers or other "getting started" resources?
Thank you Rick for all your effort!!!
Just one more question regarding the WHERE option in combination with PROC SURVEYSELECT. Is it possible to combine the WHERE option with values from another data set?
In detail: I'm trying to draw several observations from a data set X which fulfill certain conditions, which are specified in the WHERE option. One of the conditions is that none of the drawn values is equal to the values of a different data set Y. Since the values of Y are not static It would be great if the where option could take the specific values of Y automatically.
Example:
x= [1 2 ....... 100]
y= [2 55 83 99]
Draw 10 observations of X. Exclude the values of Y.
If tried it with the exact location of the other data sat (i.e.: proc surveyselect data=work.x (where=(X<>work.Y))) but it won't accept it:(
Thank you for your great support!!!
You can just add a step to generate the list of values to exclude. One way is to create a macro variable with the list.
* Setup some data for the example;
data y;
do y=2,55,83,99; output; end;
run;
data x;
do x=1 to 100; output; end;
run;
* Create the list into a macro variable ;
proc sql noprint ;
select distinct y into :ylist separated by ' '
from y
;
quit;
* Use the list in a WHERE dataset option ;
proc surveyselect data=x (where=(x not in (&ylist))) ....
Hi Tom!
Thank you for your quick help.
I have to admit that I have no experience with macro variables. I've tried to implement your code in SAS. However, I don't really get it to work. When I ran the code with a sample size of 96 the PROC SURVEYSELECT data set "test" contained also the excluded values of y.
Here is the exact code:
data y;
do y=2,55,83,99; output; end
run;
data x;
do x=1 to 100; output; end
run;
proc sql noprint;
select distinct y into :ylist separated by '' from y;
quit;
proc surveyselect data=x (where=(x not in (&ylist)))method=SRS sampsize=96 seed=0 out=test;
run;
Thank you so much!!!
please change "separated by ' '" to "separated by ',' "
data y;
do y=2,55,83,99; output; end;
run;
data x;
do x=1 to 100; output; end;
run;
proc sql noprint;
select distinct y into :ylist separated by ','
from y;
quit;
proc surveyselect data=x (where=(x not in (&ylist)))method=SRS sampsize=96 seed=0 out=test;
run;
You do not need to use a comma, you can use either a comma or spaces to separate lists of numbers.
The problem with Mark's code is that there is no space inside the quotes so the value turns into one really long string.
91 %put &ylist;
2558399
Thank you Tom!
Silly me.....
Thank you for your help, Tom!!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.