BookmarkSubscribeRSS Feed
PGStats
Opal | Level 21

The reason why weighted selection without replacement cannot be done when some of the relative weigths are greater than 1/sampsize is quite simple. The goal of the sampling method is to give a sampling probability of weight/sum(weight) to every sampling unit. But since the unit can only appear once in the sample (because replacement is not allowed), the maximum sampling probability of any unit is 1/sampsize. Thus, the goal is mathematically impossible to achieve for units with relative weights greater than 1/sampsize.

PG
Rick_SAS
SAS Super FREQ

> I would like to use a standard proc that is available in SAS.

Do you have access to PROC IML? 

BerryH
Obsidian | Level 7

Unfortunately not.

Tom
Super User Tom
Super User

Not really an expert on PROC SURVEYSELECT but if you wanted to generate a random value from 1 to N with weighted probabilities use the RAND('table',....) function.  So to convert your weights into percentages divide by the sum of the weigths.

 

To sample without replacement you could try adjusting the percentages removing the sampled case.  I am not sure if this is mathematically correct, but it should at least lead to a fairly close approximation of random.

 


data one;
  input Unit_ID weight @@;
  datalines;
 1  237.18    2 567.89    3  118.50    4   74.38     5 1287.23     6  258.10
 7  325.36    8 218.38    9 1670.80   10  134.71    11 2020.70    12   47.80
13 1183.45   14 330.54   15  780.10   16  895.80    17  620.10    18  420.18
19  979.66   20 810.25   21  670.85   22  314.58    23   87.50    24 1893.40
25  753.30   26 540.65   27 2580.35   28  230.56    29  185.60    30  688.43
31  505.14   32 205.48   33  650.42   34 1348.34    35   30.50    36 2214.80
37  940.35   38 217.85   39  142.90   40  806.90    41  560.72
;

proc sql noprint;
select count(*) into :nobs trimmed from one;
quit;


%let samp=13;
data sample;
* Load the orginial weights ;
* Sum weights ;
  if _n_=1 then do;
     TOTAL_WT=0;
     do i=1 to &nobs;
       set one point=i  ;
       array wt [1:&nobs] _temporary_;
       wt[i]=weight;
       total_wt + weight ;
     end;
  end;
  do choice=1 to &samp;
* Re-calculate percentages ;
    array pct [1:&nobs] _temporary_;
    do i=1 to &nobs;
      pct[i] = wt[i] / total_wt ;
    end;
    j=rand('table',of pct[*]);
    if j in (1:&nobs) then do;
      set one point=j;
      output;
      total_wt+-wt[j];
      wt[j]=0;
    end;
  end;
stop;
run;

proc print;
run;

 

Ksharp
Super User
Maybe I misunderstand something about your question.
Why not PROC SORT your dataset BY weight and pick up the obs which have maximize 12 of weight ?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 3241 views
  • 13 likes
  • 9 in conversation