03-22-2016 03:51 AM
Problem: With different iniitial seed, same sample is still being selected in some stratum.
Detail: I have already finished sample selection (about 5000 samples was selected in the base of 300 000) using proc surveyselect but now I have to do it again. This time I need to make sure the selected sample wont be selected again (pop size and sam size unchange).
I have tried putting _seed_ + 1 as the new iniitial seed in file "temp1" where the value of _seed_ is the iniitial seed I run surveyselect first time. However, some stratum still select the same sample as my first run.
What should I do to ensure not to select the selected sample?
Here is my code:
proc surveyselect data = data1 out = sample_selected method = sys
seed = temp1 sampsize = temp1 outseed;
strata x y z;
03-22-2016 10:14 AM
Well, there is always a certain probability of obtaining the same random sample twice. With a large number of strata and possibly extreme sampling rates in some of them this probability can be substantial. But one of the most efficient ways to increase this probability dramatically is to use systematic random sampling (METHOD=SYS). Why did you choose this method?
Example: There are COMB(50,10) = 10,272,278,170 different subsets of size 10 in a set of 50 elements. Each of these subsets is a possible outcome when using simple random sampling (METHOD=SRS) for selecting 10 out of 50. However, by switching to systematic random sampling only 5 possible outcomes remain. Hence, the probability of obtaining the same sample twice in two independent draws is <0.00000001% in the first case (SRS), but 20% in the second case (SYS). (The probability of obtaining two disjoint samples, would be higher with METHOD=SYS, though.)
You could use the START= method option to ensure disjoint samples under certain conditions. Please note that this precludes the use of SEED= and OUTSEED (please see the documentation for details). In this case I would first sort dataset DATA1 randomly within strata and then use method=sys(start=1) and method=sys(start=2) for the two draws.
Just for curiosity: Do you also get the message
WARNING: Ignoring second data set reference.
in the log with the code you posted?
03-22-2016 09:20 PM
Thanks for your reply!
I tried start= mothod but error orrured..
proc surveyselect data = data1 out = selected method = sys (start=1)
sampsize = temp;
strata x y z;
ERROR 22-322: Syntax error, expecting one of the following: ;, CERTAIN, CERTSIZE, DATA, JTPROBS, M, MAXSIZE, METHOD, MINSIZE, N, NMAX, NMIN, NOPRINT, OUT, OUTALL, OUTHITS, OUTSEED, OUTSIZE, OUTSORT, RATE, REPS, SAMPRATE, SAMPSIZE, SEED, SELECTALL, SORT, SRSALG, STATS.
I am using SAS 9.2....is it a new function in later version?
Yes, I got the warning message "WARNING: Ignoring second data set reference."
but it seems do not affect the result...I dont know what the warning for.
03-23-2016 06:12 AM
Yes, it seems that the START= method option was newly introduced in SAS/STAT 13.1, hence it's unavailable in SAS 9.2.
You could draw your first sample and either
Thus, you would obtain disjoint samples (if that's what you're after and not just different samples with a possible overlap).
Or you double the sample sizes in the first draw and then draw the second sample from the first.