BookmarkSubscribeRSS Feed
gf_danny
Calcite | Level 5

Problem: With different iniitial seed, same sample is still being selected in some stratum.

 

Detail: I have already finished sample selection (about 5000 samples was selected in the base of 300 000) using proc surveyselect but now I have to do it again. This time I need to make sure the selected sample wont be selected again (pop size and sam size unchange).

 

I have tried putting _seed_ + 1 as the new iniitial seed in file "temp1" where the value of _seed_ is the iniitial seed I run surveyselect first time. However, some stratum still select the same sample as my first run.

 

What should I do to ensure not to select the selected sample?

 

Here is my code:

proc surveyselect data = data1 out = sample_selected method = sys
                   seed = temp1 sampsize = temp1 outseed;
strata x y z;
run;

 

4 REPLIES 4
FreelanceReinh
Jade | Level 19

Hello @gf_danny,

 

Well, there is always a certain probability of obtaining the same random sample twice. With a large number of strata and possibly extreme sampling rates in some of them this probability can be substantial. But one of the most efficient ways to increase this probability dramatically is to use systematic random sampling (METHOD=SYS). Why did you choose this method?

 

Example: There are COMB(50,10) = 10,272,278,170 different subsets of size 10 in a set of 50 elements. Each of these subsets is a possible outcome when using simple random sampling (METHOD=SRS) for selecting 10 out of 50. However, by switching to systematic random sampling only 5 possible outcomes remain. Hence, the probability of obtaining the same sample twice in two independent draws is <0.00000001% in the first case (SRS), but 20% in the second case (SYS). (The probability of obtaining two disjoint samples, would be higher with METHOD=SYS, though.)

 

You could use the START= method option to ensure disjoint samples under certain conditions. Please note that this precludes the use of SEED= and OUTSEED (please see the documentation for details). In this case I would first sort dataset DATA1 randomly within strata and then use method=sys(start=1) and method=sys(start=2) for the two draws.

 

Just for curiosity: Do you also get the message

WARNING: Ignoring second data set reference.

in the log with the code you posted?

gf_danny
Calcite | Level 5

Thanks for your reply!

I tried start= mothod but error orrured..

 

proc surveyselect data = data1 out = selected method = sys (start=1)
                  sampsize = temp;
strata x y z;
run;
ERROR 22-322: Syntax error, expecting one of the following: ;, CERTAIN, CERTSIZE, DATA, JTPROBS, M, MAXSIZE, METHOD, MINSIZE, N, NMAX, NMIN, NOPRINT, OUT, OUTALL, OUTHITS, OUTSEED, OUTSIZE, OUTSORT, RATE, REPS, SAMPRATE, SAMPSIZE, SEED, SELECTALL, SORT, SRSALG, STATS.

I am using SAS 9.2....is it a new function in later version?

 

 

Yes, I got the warning message "WARNING: Ignoring second data set reference."

but it seems do not affect the result...I dont know what the warning for.

FreelanceReinh
Jade | Level 19

Yes, it seems that the START= method option was newly introduced in SAS/STAT 13.1, hence it's unavailable in SAS 9.2.

 

You could draw your first sample and either

  • create a new dataset from DATA1 with the selected observations removed and use this as the basis for the second draw
  • or flag the selected observations in DATA1 (say, with a 0-1 variable SELECTED) and draw the second sample from DATA1 with the restriction where not selected;

Thus, you would obtain disjoint samples (if that's what you're after and not just different samples with a possible overlap).

 

Or you double the sample sizes in the first draw and then draw the second sample from the first.

gf_danny
Calcite | Level 5

thx for the reply!

Finally I am going to remove the selected sample from data1.

 

Thank you

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1388 views
  • 2 likes
  • 2 in conversation