BookmarkSubscribeRSS Feed
wriccar
Fluorite | Level 6

Hello fellow SAS Users,

 

I have a (hopefully) straightforward question. I am building random samples for various data partitions using PROC SURVEYSELCT. 

My goal is to build a random sample, with replacement, equal in size to the original sample, repeated to form 500 random samples.One partition has a sample size of n=9,544 and my code looks like this:

 

PROC SURVEYSELECT DATA=PRE_TREAT SAMPSIZE=9544 METHOD=PPS_WR
OUT=PRE_TREAT_RANDOM
REPS=500;
SIZE VALUE;
RUN;

The program runs and the SAS Output of the procedure shows Sample Size of 9,544. 

The output data set, however, does not contain 9,544 observations. Rather, it has, for example on the first iteration, 2,461 observations and another variable "NumberHits" is included that specifies how many times a given observation was used in a given iteration. 

 

Everything seems to be running, but...

 

Here is my question:

For my purposes, if an observation is included multiple times in the random sample produced, then I would have expected a new duplicate observation to be added to the output data set such that n=9,544. I notice when running simple data analytics, such as PROC MEANS, it is reading n=2,461. 

 

Is it possible to achieve this? 

 

 

2 REPLIES 2
data_null__
Jade | Level 19

You could use NUMHITS as the FREQ variable in PROC SUMMARY.

 

You could look at the documention and find the option to 

 

OUTHITS

includes a distinct copy of each selected unit in the OUT= output data set when the same sampling unit is selected more than once. By default, the output data set contains a single copy of each unit selected, even when a unit is selected more than once, and the variable NumberHits records the number of hits (selections) for each unit. If you specify the OUTHITS option, the output data set contains m copies of a sampling unit for which NumberHits is m; for example, the output data set contains three copies of a unit that is selected three times (NumberHits is 3).

 

wriccar
Fluorite | Level 6

So simple, and I literally JUST found that option as I got the message that someone had replied. 

Thanks so much!

 

Robot Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1429 views
  • 0 likes
  • 2 in conversation