The reason why weighted selection without replacement cannot be done when some of the relative weigths are greater than 1/sampsize is quite simple. The goal of the sampling method is to give a sampling probability of weight/sum(weight) to every sampling unit. But since the unit can only appear once in the sample (because replacement is not allowed), the maximum sampling probability of any unit is 1/sampsize. Thus, the goal is mathematically impossible to achieve for units with relative weights greater than 1/sampsize.
> I would like to use a standard proc that is available in SAS.
Do you have access to PROC IML?
Unfortunately not.
Not really an expert on PROC SURVEYSELECT but if you wanted to generate a random value from 1 to N with weighted probabilities use the RAND('table',....) function. So to convert your weights into percentages divide by the sum of the weigths.
To sample without replacement you could try adjusting the percentages removing the sampled case. I am not sure if this is mathematically correct, but it should at least lead to a fairly close approximation of random.
data one;
input Unit_ID weight @@;
datalines;
1 237.18 2 567.89 3 118.50 4 74.38 5 1287.23 6 258.10
7 325.36 8 218.38 9 1670.80 10 134.71 11 2020.70 12 47.80
13 1183.45 14 330.54 15 780.10 16 895.80 17 620.10 18 420.18
19 979.66 20 810.25 21 670.85 22 314.58 23 87.50 24 1893.40
25 753.30 26 540.65 27 2580.35 28 230.56 29 185.60 30 688.43
31 505.14 32 205.48 33 650.42 34 1348.34 35 30.50 36 2214.80
37 940.35 38 217.85 39 142.90 40 806.90 41 560.72
;
proc sql noprint;
select count(*) into :nobs trimmed from one;
quit;
%let samp=13;
data sample;
* Load the orginial weights ;
* Sum weights ;
if _n_=1 then do;
TOTAL_WT=0;
do i=1 to &nobs;
set one point=i ;
array wt [1:&nobs] _temporary_;
wt[i]=weight;
total_wt + weight ;
end;
end;
do choice=1 to &samp;
* Re-calculate percentages ;
array pct [1:&nobs] _temporary_;
do i=1 to &nobs;
pct[i] = wt[i] / total_wt ;
end;
j=rand('table',of pct[*]);
if j in (1:&nobs) then do;
set one point=j;
output;
total_wt+-wt[j];
wt[j]=0;
end;
end;
stop;
run;
proc print;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.