06-08-2016 04:28 PM
Hello fellow SAS Users,
I have a (hopefully) straightforward question. I am building random samples for various data partitions using PROC SURVEYSELCT.
My goal is to build a random sample, with replacement, equal in size to the original sample, repeated to form 500 random samples.One partition has a sample size of n=9,544 and my code looks like this:
PROC SURVEYSELECT DATA=PRE_TREAT SAMPSIZE=9544 METHOD=PPS_WR OUT=PRE_TREAT_RANDOM REPS=500; SIZE VALUE; RUN;
The program runs and the SAS Output of the procedure shows Sample Size of 9,544.
The output data set, however, does not contain 9,544 observations. Rather, it has, for example on the first iteration, 2,461 observations and another variable "NumberHits" is included that specifies how many times a given observation was used in a given iteration.
Everything seems to be running, but...
Here is my question:
For my purposes, if an observation is included multiple times in the random sample produced, then I would have expected a new duplicate observation to be added to the output data set such that n=9,544. I notice when running simple data analytics, such as PROC MEANS, it is reading n=2,461.
Is it possible to achieve this?
06-08-2016 04:37 PM
You could use NUMHITS as the FREQ variable in PROC SUMMARY.
You could look at the documention and find the option to
includes a distinct copy of each selected unit in the OUT= output data set when the same sampling unit is selected more than once. By default, the output data set contains a single copy of each unit selected, even when a unit is selected more than once, and the variable
NumberHits records the number of hits (selections) for each unit. If you specify the OUTHITS option, the output data set contains m copies of a sampling unit for which
NumberHits is m; for example, the output data set contains three copies of a unit that is selected three times (
NumberHits is 3).