Programming the statistical procedures from SAS

QUESTION: PROC SURVEYSELECT selection with replacement

Reply
Occasional Contributor
Posts: 19

QUESTION: PROC SURVEYSELECT selection with replacement

Hello fellow SAS Users,

 

I have a (hopefully) straightforward question. I am building random samples for various data partitions using PROC SURVEYSELCT. 

My goal is to build a random sample, with replacement, equal in size to the original sample, repeated to form 500 random samples.One partition has a sample size of n=9,544 and my code looks like this:

 

PROC SURVEYSELECT DATA=PRE_TREAT SAMPSIZE=9544 METHOD=PPS_WR
OUT=PRE_TREAT_RANDOM
REPS=500;
SIZE VALUE;
RUN;

The program runs and the SAS Output of the procedure shows Sample Size of 9,544. 

The output data set, however, does not contain 9,544 observations. Rather, it has, for example on the first iteration, 2,461 observations and another variable "NumberHits" is included that specifies how many times a given observation was used in a given iteration. 

 

Everything seems to be running, but...

 

Here is my question:

For my purposes, if an observation is included multiple times in the random sample produced, then I would have expected a new duplicate observation to be added to the output data set such that n=9,544. I notice when running simple data analytics, such as PROC MEANS, it is reading n=2,461. 

 

Is it possible to achieve this? 

 

 

Respected Advisor
Posts: 3,780

Re: QUESTION: PROC SURVEYSELECT selection with replacement

You could use NUMHITS as the FREQ variable in PROC SUMMARY.

 

You could look at the documention and find the option to 

 

OUTHITS

includes a distinct copy of each selected unit in the OUT= output data set when the same sampling unit is selected more than once. By default, the output data set contains a single copy of each unit selected, even when a unit is selected more than once, and the variable NumberHits records the number of hits (selections) for each unit. If you specify the OUTHITS option, the output data set contains m copies of a sampling unit for which NumberHits is m; for example, the output data set contains three copies of a unit that is selected three times (NumberHits is 3).

 

Occasional Contributor
Posts: 19

Re: QUESTION: PROC SURVEYSELECT selection with replacement

So simple, and I literally JUST found that option as I got the message that someone had replied. 

Thanks so much!

 

Robot Happy

Ask a Question
Discussion stats
  • 2 replies
  • 193 views
  • 0 likes
  • 2 in conversation