BookmarkSubscribeRSS Feed
chsprogramming
Calcite | Level 5

I would like to randomly re-assign records based on criteria. For example in the table below, I would like to re-assign 30% of records where the ID is 4 to be an ID of 2, and 10% of the records where the ID is 3 to be an ID of 2. I have looked into the rand function using both bernoulli and uniform and can't seem to get it right.

I believe using the rand("uniform") or rand("bernoulli") functions is what would work best, but I can't seem to get the code dynamic enough to account for the specific percentages and also re-assign based on two different criteria.

chsprogramming_0-1734547705457.png

 

3 REPLIES 3
PaigeMiller
Diamond | Level 26

Something like this:

 

if id=4 and rand('uniform')<0.3 then id=2;
else if id=3 and rand('uniform')<0.1 then id=2;
--
Paige Miller
ballardw
Super User

I would suggest using Proc SURVEYselect to select the ones for reassignment:

Unfortunately you provided "example" as a picture, not a working data step or even text.

The ID would be treated as a STRATA variable so for the procedure to work the data would have to be sorted by the id;

 

data have;
   input id count;
   /* create one observation per from summary for example data*/
   do i=1 to count;
      output;
   end;
   drop i count;
datalines;
1 3
2 2
3 8
4 10
;

proc surveyselect data=have out=selected  outall
    samprate =(100 100 10 30);
  ;
  strata id;
  id id;
run;

The output data set has an added variable "selected" that can be used in a data step to do the adjustment. One way:

data want;
   set selected;
   select (id);
      when (3) if selected=1 then id= 2;
      when (4) if selected=1 then id= 2;
      otherwise;
   end;
   drop selected  selectionprob samplingweight;
run;

I am using a Select/When block instead of if/then/else as I am suspecting this process may get pushed to more than a couple of values of "Id" and the select is easier to keep straight in that case. Since your example has both assigning a value of 2

   when (3,4) if selected=1 then id=2;

could be used. The other way  would be more flexible for different assignments for different Id values.

 

You said 10% of the 3s. But only showed 4 values. So I'm not quite sure what you expect. The SAMPRATE is a percentage. You can use SAMPSIZE to specify a specific number. The SAMPRATE or SAMPSIZE should have one entry for each Strata when the rate/counts may different. Don't mix the size or rate, pick one. If you want all of the values for a strata then the rate is 100 (or 1) or size the number of observations for the id.

With small numbers it is likely that the "rate" approach may not select any. 10% of 4 is 0.4 which would be the expected number of selected ID values of 3. Which means most likely not going to get any.

Tom
Super User Tom
Super User

Depends on what you mean.

If you just want to give each observation that has ID=4 a 30% CHANCE of being reassigned do something like:

data want;
  set have;
  if id=4 and rand('uniform')<=0.30 then id=2;
  else if id=3 and rand('uniform')<=0.10 then id=2;
run;

But since the choice on each is random (pseudorandom) it might be that 20% or 40% of them get transformed instead of 30%.

 

If you want to instead say that since there are 4 observations with ID=3 that you want only 1 of them transformed into ID=2 you will need to do something different.  In that case calculate how many you want transformed.  The randomly order the observations and change the first N.

proc freq data=have;
  tables id / noprint out=counts;
run;
data counts;
  set counts;
  if id=4 then count=ceil(count*0.30);
  else if id=3 then count=ceil(count*0.10);
  keep id count;
run;
data random;
  set have;
  randnum = rand('uniform');
run;
proc sort data=random;
  by id randnum;
run;
data want;
   merge random counts;
   by id;
   newid = id;
   if count>0 then if id in (3,4) then newid=2;
   output;
   count+(-1);
run;

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 221 views
  • 2 likes
  • 4 in conversation