Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- setup iniitial seed to select different sample in ...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-22-2016 03:51 AM

Problem: With different iniitial seed, same sample is still being selected in some stratum.

Detail: I have already finished sample selection (about 5000 samples was selected in the base of 300 000) using proc surveyselect but now I have to do it again. This time I need to make sure the selected sample wont be selected again (pop size and sam size unchange).

I have tried putting _seed_ + 1 as the new iniitial seed in file "temp1" where the value of _seed_ is the iniitial seed I run surveyselect first time. However, some stratum still select the same sample as my first run.

What should I do to ensure not to select the selected sample?

Here is my code:

proc surveyselect data = data1 out = sample_selected method = sys

seed = temp1 sampsize = temp1 outseed;

strata x y z;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gf_danny

03-22-2016 10:14 AM

Hello @gf_danny,

Well, there is always a certain probability of obtaining the same random sample twice. With a large number of strata and possibly extreme sampling rates in some of them this probability can be substantial. But one of the most efficient ways to increase this probability dramatically is to use systematic random sampling (METHOD=SYS). Why did you choose this method?

Example: There are COMB(50,10) = 10,272,278,170 different subsets of size 10 in a set of 50 elements. Each of these subsets is a possible outcome when using simple random sampling (METHOD=SRS) for selecting 10 out of 50. However, by switching to systematic random sampling only 5 possible outcomes remain. Hence, the probability of obtaining the same sample twice in two independent draws is <0.00000001% in the first case (SRS), but 20% in the second case (SYS). (The probability of obtaining two *disjoint* samples, would be higher with METHOD=SYS, though.)

You could use the START= method option to ensure disjoint samples under certain conditions. Please note that this precludes the use of SEED= and OUTSEED (please see the documentation for details). In this case I would first sort dataset DATA1 randomly within strata and then use method=sys(start=1) and method=sys(start=2) for the two draws.

Just for curiosity: Do you also get the message

WARNING: Ignoring second data set reference.

in the log with the code you posted?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

03-22-2016 09:20 PM

Thanks for your reply!

I tried start= mothod but error orrured..

proc surveyselect data = data1 out = selected method = sys (start=1)

sampsize = temp;

strata x y z;

run;

ERROR 22-322: Syntax error, expecting one of the following: ;, CERTAIN, CERTSIZE, DATA, JTPROBS, M, MAXSIZE, METHOD, MINSIZE, N, NMAX, NMIN, NOPRINT, OUT, OUTALL, OUTHITS, OUTSEED, OUTSIZE, OUTSORT, RATE, REPS, SAMPRATE, SAMPSIZE, SEED, SELECTALL, SORT, SRSALG, STATS.

I am using SAS 9.2....is it a new function in later version?

Yes, I got the warning message "WARNING: Ignoring second data set reference."

but it seems do not affect the result...I dont know what the warning for.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gf_danny

03-23-2016 06:12 AM

Yes, it seems that the START= method option was newly introduced in SAS/STAT 13.1, hence it's unavailable in SAS 9.2.

You could draw your first sample and either

- create a new dataset from DATA1 with the selected observations removed and use this as the basis for the second draw
- or flag the selected observations in DATA1 (say, with a 0-1 variable SELECTED) and draw the second sample from DATA1 with the restriction where not selected;

Thus, you would obtain disjoint samples (if that's what you're after and not just *different* samples with a possible overlap).

Or you double the sample sizes in the first draw and then draw the second sample from the first.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

03-23-2016 10:19 PM

thx for the reply!

Finally I am going to remove the selected sample from data1.

Thank you FreelanceReinhard