RANUNI performs unexpectedly for specific seed values (9.4 m6)

kdcheek · Posted 09-22-2021 11:37 PM

For a class with group assignments, I wish to match groups randomly to serve as peer reviewers for other groups. Specifically I wish to randomly match two peer reviewer groups to each project group. Each peer reviewer group will then also have groups randomly assigned to review it. I have written a basic DATA step with nested SET statements to perform this matching. As a seed value for the RANUNI function I have simply chosen the week of the term for which a particular set of pairings applies - so that I can replicate the pairings at a later date if I so choose. (Full code attached.)

After a few uses I noticed that for some seed values the DATA step ended before every group had peers assigned to it. For seeed=1 the results actually vary from run to run. For most seed values, though, it works as expected. I replaced the RANUNI function with the new RAND function (with which I am not very familiar) and the DATA step seems to run without any issues for every seed value I have tested.

I have pored over the code and cannot find the source of the issue. I am here hoping that more skilled coders than me might help. (As an aside: there may be better ways to frame the sample process, or better ways to ensure randomness. I am happy to open separate threads on those issues if appropriate. But for this thread I am specifically interested in the behavior of RANUNI.)

ballardw · Posted 09-23-2021 12:13 AM

Your code attachment didn't make it.

Instead of an attachment, copy the text from your editor, open a code box using the little "running man" above the message box and then paste the code in the box.

kdcheek · Posted 09-23-2021 12:29 AM

proc format library=work;

value  groupname 

1 = "Group 01 - Project A"
2 =	"Group 02 - Project B"
3 =	"Group 03 - Project C"
4 =	"Group 04 - Project D"
5 =	"Group 05 - Project E"
6 =	"Group 06 - Project F"
;

run ;

data groups ;
do group = 1 to 6 ;
output ;
end ;
run ;

%let week = 2 ;




data peers_to_review (keep=selected1 selected2 group peer1 peer2 rename=(group=Reviewer ) ) ;
format selected1 selected2 $64. ;

format peer1 peer2 group groupname. ;
label group = "My Group" peer1 = "First Group That I am Reviewing" 
	peer2 = "Second Group That I Am Reviewing" ;

retain selected1 selected2 "Z";
set groups ;
	do j = 1 to  50000 ;
*		rand = rand("Integer", 1, nobs1 ) ;
		rand=int(ranuni(&week.)*nobs1)+1 ;
		set groups (rename=(group=peer1)) nobs=nobs1 point=rand;
			if peer1^=group and index(selected1,compress(","||trim(put(peer1,z2.))))=0 then do ; 
				leave ; 
			end ;
	end ;
selected1 = catx(',',selected1,compress(put(peer1,z2.))) ;

	do k = 1 to  500000 ;
*		rand = rand("Integer", 1, nobs2 ) ;
		rand=int(ranuni(&week.)*nobs2)+1 ;
		set groups (rename=(group=peer2)) nobs=nobs2 point=rand;
			if peer2^=group and index(selected2,compress(","||trim(put(peer2,z2.))))=0 
				and peer2^=peer1
					then do ; 
				output ;
				leave ; 
			end ;
	end ;
selected2 = catx(',',selected2,compress(put(peer2,z2.))) ;

run ;


	title 'Who Am I Reviewing?' ;
	proc print data=peers_to_review label ; 
	var Reviewer Peer1 Peer2 / style(data)={just=l} style(header)={just=l} ; run ;





data peers_by_whom (keep=peer1 Reviewer1 Reviewer2 rename=(peer1=MyGroup )) ;
format Reviewer1 Reviewer2 peer1 groupname. ;

label peer1 = "My Group" Reviewer1 = "First Group Reviewing My Group" 
	Reviewer2 = "Second Group Reviewing My Group" ;

set peers_to_review (keep = Reviewer peer1 rename=(Reviewer=Reviewer1)) ;
do i = 1 to nobs ;
	set peers_to_review (keep = Reviewer peer2 rename=(Reviewer=Reviewer2)) nobs=nobs point=i ; 
	if peer1=peer2 then do ; output ; leave ; end ;
end ;
run ; 

title "Who's Reviewing Me?" ;
proc sort data=peers_by_whom ; by MyGroup ; run ;

	proc print data=peers_by_whom label ; 
	var MyGroup Reviewer1 Reviewer2 / style(data)={just=l} style(header)={just=l} ;
	run ;


title ;

FreelanceReinh · Posted 09-23-2021 09:06 AM

Here is an alternative approach to the random assignment:

%let m=6; /* number of groups (should be <=7) */

/* Create all combinations of two derangements of (1,...,&m) which are also derangements of each other */

data comb(keep=x: y:);
array x[&m] (1:&m);
array y[&m] (1:&m);
n=fact(&m);
do i=1 to n;
  c=allperm(i, of x[*]);
  do j=1 to n;
    d=allperm(j, of y[*]);
    do k=1 to &m;
      if x[k]=k | y[k]=k | x[k]=y[k] then leave;
    end;
    if k>&m then output;
  end;
end;
run;

/* Assign peers randomly, following the rules */

data want(drop=x: y:);
call streaminit(27182818);
r=rand('integer',c);
array x[&m];
array y[&m];
set comb point=r nobs=c;
do group=1 to &m;
  peer1=x[group];
  peer2=y[group];
  output;
end;
stop;
format peer1 peer2 group groupname.;
label group = "My Group"
      peer1 = "First Group That I am Reviewing"
      peer2 = "Second Group That I Am Reviewing";
run;

For &m=6 the number of observations in dataset COMB is 21,280. (For &m=7 it's 1,073,760 and for &m>=8 it would be better to develop a more efficient approach.)

Dataset COMB contains all possible combinations of two derangements (=permutations without fixed points) of the &m groups following the rules. Picking one of these randomly is equivalent to a two-step approach that yields every admissible assignment with equal probability. One could also store the combinations in a temporary array or hash object instead of a dataset, in particular if only a single assignment needs to be created so that COMB would be used only once.

kdcheek · Posted 09-23-2021 09:43 AM

I like this approach! The one concern i can foresee is that i may have 20 or more groups in a large class. I will consider some ways to make this approach a little more compact, though.

FreelanceReinh · Posted 09-23-2021 10:05 AM

@kdcheek wrote:
I like this approach! The one concern i can foresee is that i may have 20 or more groups in a large class. I will consider some ways to make this approach a little more compact, though.

20 or more groups? This would render the "COMB" approach totally impossible (please do not test it). For &m=20 the number of DO-loop iterations would exceed (20!)² = 5.9...E36, i.e., take many centuries even on a super computer and disregarding the limited disk space (I guess).

Maybe PROC PLAN could be used for this purpose or SAS/OR (which I don't have).

kdcheek · Posted 09-23-2021 10:17 AM

Yes, it would have to be compressed *a lot* - I was thinking about a hybrid of your approach and the original approach. My interim thought is to evaluate the original approach at the N-1th step and see if the previous matches all exclude the Nth group, and force the N-1th selection to take the choice (the Nth group) that leaves an option for the Nth case (a non-Nth group). This would presumably affect the character of the randomness - but I can deal with that separately.

FreelanceReinh · Posted 09-23-2021 11:09 AM

Here's a new approach which is similar to your original code in that random assignments are iterated until they meet the requirements.

%let m=6; /* number of groups */
%let maxiter=1e5; /* maximum number of iterations */

/* Create random permutations, check the rules, then assign peers if requirements are met */

data want(drop=seed k);
array x[&m] _temporary_ (1:&m);
array y[&m] _temporary_ (1:&m);
seed=27182818;
do _n_=1 to &maxiter;
  call ranperm(seed, of x[*]);
  call ranperm(seed, of y[*]);
  do k=1 to &m;
    if x[k]=k | y[k]=k | x[k]=y[k] then leave;
  end;
  if k>&m then leave;
end;
if _n_<=&maxiter then do;
  do group=1 to &m;
    peer1=x[group];
    peer2=y[group];
    output;
  end;
  put 'Success! Suitable assignment found after ' _n_ 'iteration(s).';
end;
else put "WAR" "NING: No suitable assignment found after &maxiter iterations.";
format peer1 peer2 group groupname.;
label group = "My Group"
      peer1 = "First Group That I am Reviewing"
      peer2 = "Second Group That I Am Reviewing";
run;

It looks like it can deal easily with &m=20 or even &m=100 groups because the probability is large enough to find a suitable assignment.

kdcheek · Posted 09-26-2021 01:21 PM

Yes this is great also. Thank you for the feedback. This approach may lend itself to something similar in IML as well.

FreelanceReinh · Posted 09-23-2021 04:48 AM

Hello @kdcheek,

Just to answer your specific question: I think the process is set up in such a way that the assignments stop "prematurely" with a certain probability. Hence, this can occur with any random number generator, not only with RANUNI. In repeated runs of the code I have occasionally observed this "early stopping" (for peer1 and/or peer2) with the RAND function as well. Note that without using the CALL STREAMINIT routine the initial seed of the RAND function changes from run to run: see section Reproducing a Random Number Stream in the RAND function documentation. This is similar to using RANUNI with seed 0. With a positive seed for RANUNI, however, it just depends on the seed whether or not the assignments stop "prematurely."

Example: If the first five assignments of peer1 happen to result in a permutation of (1, 2, 3, 4, 5), e.g., 4, 3, 5, 1, 2 (for group 1, 2, 3, 4, 5, respectively) -- and of course, there is a certain probability for this to happen --, then the sixth assignment would necessarily violate one of the two IF conditions, peer1^=group or index(...)=0.

I'm sure there are ways to avoid these issues and also to simplify the code. For example, I don't think that you need the nested SET statements, given that all they retrieve is the random integer you had already determined before. I hope I'll have time to take a closer look at the problem later today or maybe someone else will chime in.

See also Rick Wicklin's blog article Six reasons you should stop using the RANUNI function to generate random numbers.

kdcheek · Posted 09-23-2021 09:35 AM

Ah, i overlooked the case where 1-5 are matched to 1-5 before the matches for 6 are generated. Thanks!

RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Re: RANUNI performs unexpectedly for specific seed values (9.4 m6)

Classroom Training Available!