Hi @saf_nadia
I agree with @ballardw - Do you really need random numbers and not just some sort of unique observarion ID? - It happens to all of us that we are stuck in a coding problem and focus on that instead of considering alternative solutions, so we ask for help to climb the downpipe instead of asking for help to find the staircase.
But if you need random numbers, and need them in random order too, you can cheat by making more random numbers than necessary and keeping distinct values, before they are added to the data set. I have an example here:
* Make some test data - data set HAVE;
data have;
do var1 = 1 to 1000000;
var2 = mod(var1,10)+1;
output;
end;
run;
* Get observations from have;
proc sql noprint;
select count(*) into :obs
from have;
quit;
* Generate 2*obs random numbers;
data tmp1;
do i = 1 to %eval(&obs*2);
randomnumber = round(rand('uniform')*10000000);
output;
end;
run;
* Sort by randomnumber;
proc sort data=tmp1; by randomnumber i;
run;
* Get rid of duplicates;
data tmp2; set tmp1; by randomnumber;
if first.randomnumber;
run;
* Sort back in random order;
proc sort data=tmp2 out=randomlist (drop=i); by i;
run;
* Add random number to input records;
* Keep input sorting of both data sets;
data want; merge have (in=have_exist) randomlist;
if not have_exist then stop;
output;
run;
data want;
set have;
randomnumber = rand('uniform');
run;
unless you have a pretty large number of records for the same date there's not likely to be much problem with a simple call.
The rand() function will return a number with 8 byte precision (around 15 digits). The risk that you get the exactly same random number twice is very very small unless you've got a really big data set. For this reason you could just create random numbers without having to care about the date groups.
If you want to create random numbers with lower precision where you would have the risk of repetition then you would need to maintain a sort of black list to always check if a new generated number has been used already. Below code is a possible way to do something like that.
data have;
input Date :ddmmyy. Id $ Entries;
format date date9.;
datalines;
1/2/19 A 1
1/2/19 B 2
1/2/19 B 2
2/3/19 C 3
2/3/19 C 3
2/3/19 C 3
2/3/19 A 2
2/3/19 A 2
;
data want(drop=_:);
if _n_=1 then
do;
dcl hash h1();
h1.defineKey('randomnumber');
h1.defineDone();
end;
set have;
by date;
if first.date then h1.clear();
do while(1);
randomnumber = round(rand('uniform'),0.001);
if h1.check() ne 0 then
do;
h1.add();
leave;
end;
/* avoid endless loop */
_n=sum(_n,1);
if _n=10000 then
do;
put 'aborting job after 10000 trials to create a new unused random number';
abort;
end;
end;
run;
Yes, you are absolutely right and what you observed is actually fully documented.
data have;
input Date :ddmmyy. Id $ Entries;
format date date9.;
datalines;
1/2/19 A 1
1/2/19 B 2
1/2/19 B 2
2/3/19 C 3
2/3/19 C 3
2/3/19 C 3
2/3/19 A 2
2/3/19 A 2
;
data want(drop=_:);
if _n_=1 then
do;
dcl hash h1();
h1.defineKey('randomnumber');
h1.defineDone();
end;
set have;
by date;
if first.date then h1.clear();
do while(1);
randomnumber = rand('uniform');
if h1.check() ne 0 then
do;
h1.add();
leave;
end;
/* avoid endless loop */
_n=sum(_n,1);
if _n=10000 then
do;
put 'aborting job after 10000 trials to create a new unused random number';
abort;
end;
end;
run;
data sample(keep=randomnumber);
format randomnumber best32.;
do i=1 to 10**7;
/* randomnumber = rand('uniform');*/
randomnumber = ranuni(1);
output;
end;
stop;
run;
proc sort data=sample nodupkey dupout=duplicates;
by randomnumber;
run;
proc print data=duplicates;
run;
NOTE: No observations in data set WORK.DUPLICATES.
@saf_nadia wrote:
Patrick, ny data has 1 million observations although only 93 different dates. Have tried to generate the random number using simple code by ballard and checked there are around 233 dupout random number.
Can you tell us what the purpose of the random number? Perhaps there is another method that would do what you need.
Hi @saf_nadia
I agree with @ballardw - Do you really need random numbers and not just some sort of unique observarion ID? - It happens to all of us that we are stuck in a coding problem and focus on that instead of considering alternative solutions, so we ask for help to climb the downpipe instead of asking for help to find the staircase.
But if you need random numbers, and need them in random order too, you can cheat by making more random numbers than necessary and keeping distinct values, before they are added to the data set. I have an example here:
* Make some test data - data set HAVE;
data have;
do var1 = 1 to 1000000;
var2 = mod(var1,10)+1;
output;
end;
run;
* Get observations from have;
proc sql noprint;
select count(*) into :obs
from have;
quit;
* Generate 2*obs random numbers;
data tmp1;
do i = 1 to %eval(&obs*2);
randomnumber = round(rand('uniform')*10000000);
output;
end;
run;
* Sort by randomnumber;
proc sort data=tmp1; by randomnumber i;
run;
* Get rid of duplicates;
data tmp2; set tmp1; by randomnumber;
if first.randomnumber;
run;
* Sort back in random order;
proc sort data=tmp2 out=randomlist (drop=i); by i;
run;
* Add random number to input records;
* Keep input sorting of both data sets;
data want; merge have (in=have_exist) randomlist;
if not have_exist then stop;
output;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.