BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saf_nadia
Fluorite | Level 6
Hi, I want to assign new random number to a master data by date. The random number could be the same but not within the same date. Appreciate your help!

Example input:
Date Id Entries
1/2/19 A 1
1/2/19 B 2
1/2/19 B 2
2/3/19 C 3
2/3/19 C 3
2/3/19 C 3
2/3/19 A 2
2/3/19 A 2

Output that I want:
Date Id Entries Randomnumber
1/2/19 A 1 0.287
1/2/19 B 2 0.758
1/2/19 B 2 0.958
2/3/19 C 3 0.286
2/3/19 C 3 0.346
2/3/19 C 3 0.183
2/3/19 A 2 0.983
2/3/19 A 2 0.758
1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @saf_nadia 

 

I agree with @ballardw - Do you really need random numbers and not just some sort of unique observarion ID? - It happens to all of us that we are stuck in a coding problem and focus on that instead of considering alternative solutions, so we ask for help to climb the downpipe instead of asking for help to find the staircase.

 

But if you need random numbers, and need them in random order too, you can cheat by making more random numbers than necessary and keeping distinct values, before they are added to the data set. I have an example here:

 

* Make some test data - data set HAVE;
data have; 
	do var1 = 1 to 1000000; 
		var2 = mod(var1,10)+1;
		output; 
	end; 
run;

* Get observations from have;
proc sql noprint;
	select count(*) into :obs
	from have;
quit;

* Generate 2*obs random numbers;
data tmp1; 
	do i = 1 to %eval(&obs*2);
		randomnumber = round(rand('uniform')*10000000);
		output;
	end;
run;

* Sort by randomnumber;
proc sort data=tmp1; by randomnumber i;
run;

* Get rid of duplicates;
data tmp2; set tmp1; by randomnumber;
	if first.randomnumber;
run;

* Sort back in random order;
proc sort data=tmp2 out=randomlist (drop=i); by i;
run;

* Add random number to input records;
* Keep input sorting of both data sets;
data want; merge have (in=have_exist) randomlist;
	if not have_exist then stop;
	output;
run;

 

 

View solution in original post

8 REPLIES 8
ballardw
Super User

 

data want;

   set have;

   randomnumber = rand('uniform');

run;

 

unless you have a pretty large number of records for the same date there's not likely to be much problem with a simple call.

saf_nadia
Fluorite | Level 6
I actually have 93 different dates. How to do 93 sets of random number by dates?
Patrick
Opal | Level 21

@saf_nadia 

The rand() function will return a number with 8 byte precision (around 15 digits). The risk that you get the exactly same random number twice is very very small unless you've got a really big data set. For this reason you could just create random numbers without having to care about the date groups. 

 

If you want to create random numbers with lower precision where you would have the risk of repetition then you would need to maintain a sort of black list to always check if a new generated number has been used already. Below code is a possible way to do something like that.

data have;
  input Date :ddmmyy. Id $ Entries;
  format date date9.;
  datalines;
1/2/19 A 1
1/2/19 B 2
1/2/19 B 2
2/3/19 C 3
2/3/19 C 3
2/3/19 C 3
2/3/19 A 2
2/3/19 A 2
;

data want(drop=_:);

  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('randomnumber');
      h1.defineDone();
    end;

  set have;
  by date;
  if first.date then h1.clear();

  do while(1);
    randomnumber = round(rand('uniform'),0.001);
    if h1.check() ne 0 then 
      do;
        h1.add();
        leave;
      end;

    /* avoid endless loop */
    _n=sum(_n,1);
    if _n=10000 then 
      do;
        put 'aborting job after 10000 trials to create a new unused random number';
        abort;
      end;
  end;

run;
saf_nadia
Fluorite | Level 6
Patrick, ny data has 1 million observations although only 93 different dates. Have tried to generate the random number using simple code by ballard and checked there are around 233 dupout random number.
Patrick
Opal | Level 21

@saf_nadia 

Yes, you are absolutely right and what you observed is actually fully documented.

 

Duplicate Values

The RNG algorithms used by the RAND function have extremely long periods, but this does not imply that large random samples are devoid of duplicate values. With the default 32-bit Mersenne Twister algorithm, the RAND function returns at most 232 distinct values. In a random uniform sample of size 105, the chance of drawing at least one duplicate is greater than 50%. The expected number of duplicates in a random uniform sample of size M is approximately M2/233 when M is much less than 232. For example, you should expect about 115 duplicates in a random uniform sample of size M=106. These results are consequences of the famous “birthday matching problem” in probability theory.
 
The code I've posted previously should work. You just need to allow for more digits (=remove the round() function). Below the amended code.
data have;
  input Date :ddmmyy. Id $ Entries;
  format date date9.;
  datalines;
1/2/19 A 1
1/2/19 B 2
1/2/19 B 2
2/3/19 C 3
2/3/19 C 3
2/3/19 C 3
2/3/19 A 2
2/3/19 A 2
;

data want(drop=_:);

  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('randomnumber');
      h1.defineDone();
    end;

  set have;
  by date;
  if first.date then h1.clear();

  do while(1);
    randomnumber = rand('uniform');
    if h1.check() ne 0 then 
      do;
        h1.add();
        leave;
      end;
    /* avoid endless loop */
    _n=sum(_n,1);
    if _n=10000 then 
      do;
        put 'aborting job after 10000 trials to create a new unused random number';
        abort;
      end;
  end;

run;
The ranuni() function seems to work differently and there I don't get duplicates.
data sample(keep=randomnumber);
  format randomnumber best32.;
  do i=1 to 10**7;
/*    randomnumber = rand('uniform');*/
    randomnumber = ranuni(1);
    output;
  end;
  stop;
run;

proc sort data=sample nodupkey dupout=duplicates;
  by randomnumber;
run;

proc print data=duplicates;
run;
NOTE: No observations in data set WORK.DUPLICATES.
ballardw
Super User

@saf_nadia wrote:
Patrick, ny data has 1 million observations although only 93 different dates. Have tried to generate the random number using simple code by ballard and checked there are around 233 dupout random number.

Can you tell us what the purpose of the random number? Perhaps there is another method that would do what you need.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @saf_nadia 

 

I agree with @ballardw - Do you really need random numbers and not just some sort of unique observarion ID? - It happens to all of us that we are stuck in a coding problem and focus on that instead of considering alternative solutions, so we ask for help to climb the downpipe instead of asking for help to find the staircase.

 

But if you need random numbers, and need them in random order too, you can cheat by making more random numbers than necessary and keeping distinct values, before they are added to the data set. I have an example here:

 

* Make some test data - data set HAVE;
data have; 
	do var1 = 1 to 1000000; 
		var2 = mod(var1,10)+1;
		output; 
	end; 
run;

* Get observations from have;
proc sql noprint;
	select count(*) into :obs
	from have;
quit;

* Generate 2*obs random numbers;
data tmp1; 
	do i = 1 to %eval(&obs*2);
		randomnumber = round(rand('uniform')*10000000);
		output;
	end;
run;

* Sort by randomnumber;
proc sort data=tmp1; by randomnumber i;
run;

* Get rid of duplicates;
data tmp2; set tmp1; by randomnumber;
	if first.randomnumber;
run;

* Sort back in random order;
proc sort data=tmp2 out=randomlist (drop=i); by i;
run;

* Add random number to input records;
* Keep input sorting of both data sets;
data want; merge have (in=have_exist) randomlist;
	if not have_exist then stop;
	output;
run;

 

 

saf_nadia
Fluorite | Level 6
Hi Erik. I guess I just need some unique Id without any duplication. I find this work best to my problem. Thanks Erik!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1084 views
  • 3 likes
  • 4 in conversation