Hello fellow SAS Users,
I have a (hopefully) straightforward question. I am building random samples for various data partitions using PROC SURVEYSELCT.
My goal is to build a random sample, with replacement, equal in size to the original sample, repeated to form 500 random samples.One partition has a sample size of n=9,544 and my code looks like this:
PROC SURVEYSELECT DATA=PRE_TREAT SAMPSIZE=9544 METHOD=PPS_WR
OUT=PRE_TREAT_RANDOM
REPS=500;
SIZE VALUE;
RUN;
The program runs and the SAS Output of the procedure shows Sample Size of 9,544.
The output data set, however, does not contain 9,544 observations. Rather, it has, for example on the first iteration, 2,461 observations and another variable "NumberHits" is included that specifies how many times a given observation was used in a given iteration.
Everything seems to be running, but...
Here is my question:
For my purposes, if an observation is included multiple times in the random sample produced, then I would have expected a new duplicate observation to be added to the output data set such that n=9,544. I notice when running simple data analytics, such as PROC MEANS, it is reading n=2,461.
Is it possible to achieve this?
You could use NUMHITS as the FREQ variable in PROC SUMMARY.
You could look at the documention and find the option to
OUTHITS
includes a distinct copy of each selected unit in the OUT= output data set when the same sampling unit is selected more than once. By default, the output data set contains a single copy of each unit selected, even when a unit is selected more than once, and the variable NumberHits
records the number of hits (selections) for each unit. If you specify the OUTHITS option, the output data set contains m copies of a sampling unit for which NumberHits
is m; for example, the output data set contains three copies of a unit that is selected three times (NumberHits
is 3).
So simple, and I literally JUST found that option as I got the message that someone had replied.
Thanks so much!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.