BookmarkSubscribeRSS Feed
mmraja
Fluorite | Level 6

Hi there,

 

I'm automating a 2-stage probability-based sample selection method in SAS that will be used on 20 or so different input data sets, each with its own specifications. In stage 1, I'm selecting PSUs proportional to size and in stage 2, I'm selecting SSUs (individuals here) SRS within PSUs. The number of SSUs in each PSU is not equal, but that's not a big deal.

 

In my implementation, I use PROC SURVEYSELECT with METHOD=PPS. I have once gotten an error reading "For METHOD=PPS, the relative size of each sampling unit must not exceed (1/SAMPSIZE)" which I understand. It was well explained here: http://support.sas.com/kb/23/759.html 

 

To fix it, I've manually gone into the input sample list, removed the PSU causing trouble, selected that PSU with certainty for my sample (which seems logical to me since the probability of selection is greater than 1), then selected the remaining n-1 PSUs via PPS using PROC SURVEYSELECT.

 

To me, this seems clunky and annoying, but I haven't been able to find another way to work  around this issue. Does anyone know of a way to have SAS treat these PSUs as being selected with certainty rather than having it spit out an error that the probability of selection is greater than 1? I've looked at the other variations on the PPS method in SAS and they all seem to have the same way of calculating probability of selection. 

 

Thank you!!!!

4 REPLIES 4
Reeza
Super User

What happens if you force p=1 when p>1 for those specific records rather than remove the,  

if p=1 it's automatically selected. 

mmraja
Fluorite | Level 6

So the PROC automatically calculates the probability of selection based on the sample size specified, the size of the PSU, and the total size of the stratum which the PSU is in. The formula is quite straightforward and can be found here: http://support.sas.com/kb/23/759.html

 

I am looking for a way to override part of that when the probability of selection is calculated to be greater than 1.

Reeza
Super User

If it wasn't 6AM I might remember that 🙂 

 

It mentions using CERTSIZE or SAMPSIZE but I'm assuming you want a non manual way. One way would be to manually precalculate prob and then build the sampsize via a macro. 

It's too bad SELECTALL doesn't override that issue. 

mmraja
Fluorite | Level 6
Sadly, the sample size is somewhat of a non-negotiable issue, which is why I'm having this problem in the first place. The number of PSUs to be selected per stratum are calculated prior to selection according to a 2-stage approximately self-weighting stratified design. Those numbers are chosen for the sake of generalizability after the fact.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2054 views
  • 1 like
  • 2 in conversation