@mkeintz wrote:
BTW, with the streaminit value I used, there were only 11 instances of duplicates to be skipped.
This is very plausible. I had observed between 8 and 23 in a few trials. According to formulas I've just found in Volume I of Feller (1968), p. 225, the expected value is 15.56 (see code below). So, in this case the cost of avoiding duplicates is rather the maintenance of the lookup table than the 0.003% additional samples needed on average.
%let n=%sysevalf(26**7);
%let r=500000;
data _null_;
do k=0 to &r-1;
s+1/(&n-k);
end;
E_exact_=&n*s-&r;
E_approx=&n*log((&n+0.5)/(&n-&r+0.5))-&r;
put (E:)(=best16./);
run;
/*Try UUIDGEN() function.
if you only want alpha,
you could get rid of thoese digits*/
data a;
do i=1 to 50000;
want=uuidgen(123);
output;
end;
run;
The OP has selected an answer, but I am still curious about the reason for this question. @KatLinden Why do you want the character strings to be random? How will these strings be used?
The discussion on this thread inspired me to think about this problem and write up a solution. My approach: Use base 26 to convert a set of unique integers into a set of unique strings. If you expect to assign IDs to N subjects, you can use strings that have k characters, where N < 26^k.
The primary advantage of this technique over some of the other proposals is that it ensures uniqueness of the ID values. You don't have to check whether a random string has already been assigned.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.