<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Assign new random number to data by date in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567138#M75113</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/224742"&gt;@saf_nadia&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, you are absolutely right and what you observed is actually fully documented.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3 class="xis-title"&gt;&lt;EM&gt;Duplicate Values&lt;/EM&gt;&lt;/H3&gt;
&lt;DIV id="n00n7zzhvlf7e9n1jj0ojhsbh1as" class="xis-topicContent"&gt;
&lt;DIV id="n0oz5aqxfenimyn1ka43l27pugug" class="xis-paragraph"&gt;&lt;EM&gt;The RNG algorithms used by the RAND function have extremely long periods, but this does not imply that large random samples are devoid of duplicate values. With the default 32-bit Mersenne Twister algorithm, the RAND function returns at most 2&lt;SUP class="xis-superscript"&gt;32&lt;/SUP&gt;&amp;nbsp;distinct values. In a random uniform sample of size 10&lt;SUP class="xis-superscript"&gt;5&lt;/SUP&gt;, the chance of drawing at least one duplicate is greater than 50%. The expected number of duplicates in a random uniform sample of size M is approximately M&lt;SUP class="xis-superscript"&gt;2&lt;/SUP&gt;/2&lt;SUP class="xis-superscript"&gt;33&lt;/SUP&gt;&amp;nbsp;when M is much less than&amp;nbsp;&lt;SPAN class="xis-nobr"&gt;2&lt;SUP class="xis-superscript"&gt;32&lt;/SUP&gt;&lt;/SPAN&gt;. For example, you should expect about 115 duplicates in a random uniform sample of size M=10&lt;SUP class="xis-superscript"&gt;6&lt;/SUP&gt;. These results are consequences of the famous “birthday matching problem” in probability theory.&lt;/EM&gt;&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;&lt;EM&gt;&lt;A href="https://go.documentation.sas.com/?docsetId=lefunctionsref&amp;amp;docsetTarget=p0fpeei0opypg8n1b06qe4r040lv.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en#p04b3euoqp9oewn1g8ckgx2ljawl"&gt;https://go.documentation.sas.com/?docsetId=lefunctionsref&amp;amp;docsetTarget=p0fpeei0opypg8n1b06qe4r040lv.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en#p04b3euoqp9oewn1g8ckgx2ljawl&lt;/A&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;The code I've posted previously should work. You just need to allow for more digits (=remove the round() function). Below the amended code.&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input Date :ddmmyy. Id $ Entries;
  format date date9.;
  datalines;
1/2/19 A 1
1/2/19 B 2
1/2/19 B 2
2/3/19 C 3
2/3/19 C 3
2/3/19 C 3
2/3/19 A 2
2/3/19 A 2
;

data want(drop=_:);

  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('randomnumber');
      h1.defineDone();
    end;

  set have;
  by date;
  if first.date then h1.clear();

  do while(1);
    randomnumber = rand('uniform');
    if h1.check() ne 0 then 
      do;
        h1.add();
        leave;
      end;
    /* avoid endless loop */
    _n=sum(_n,1);
    if _n=10000 then 
      do;
        put 'aborting job after 10000 trials to create a new unused random number';
        abort;
      end;
  end;

run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;The ranuni() function seems to work differently and there I don't get duplicates.&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data sample(keep=randomnumber);
  format randomnumber best32.;
  do i=1 to 10**7;
/*    randomnumber = rand('uniform');*/
    randomnumber = ranuni(1);
    output;
  end;
  stop;
run;

proc sort data=sample nodupkey dupout=duplicates;
  by randomnumber;
run;

proc print data=duplicates;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
NOTE: No observations in data set WORK.DUPLICATES.&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Wed, 19 Jun 2019 05:29:22 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2019-06-19T05:29:22Z</dc:date>
    <item>
      <title>Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567088#M75104</link>
      <description>Hi, I want to assign new random number to a master data by date. The random number could be the same but not within the same date. Appreciate your help!&lt;BR /&gt;&lt;BR /&gt;Example input:&lt;BR /&gt;Date Id Entries&lt;BR /&gt;1/2/19 A 1&lt;BR /&gt;1/2/19 B 2&lt;BR /&gt;1/2/19 B 2&lt;BR /&gt;2/3/19 C 3&lt;BR /&gt;2/3/19 C 3&lt;BR /&gt;2/3/19 C 3&lt;BR /&gt;2/3/19 A 2&lt;BR /&gt;2/3/19 A 2&lt;BR /&gt;&lt;BR /&gt;Output that I want:&lt;BR /&gt;Date Id Entries Randomnumber&lt;BR /&gt;1/2/19 A 1 0.287&lt;BR /&gt;1/2/19 B 2 0.758&lt;BR /&gt;1/2/19 B 2 0.958&lt;BR /&gt;2/3/19 C 3 0.286&lt;BR /&gt;2/3/19 C 3 0.346&lt;BR /&gt;2/3/19 C 3 0.183&lt;BR /&gt;2/3/19 A 2 0.983&lt;BR /&gt;2/3/19 A 2 0.758</description>
      <pubDate>Tue, 18 Jun 2019 23:03:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567088#M75104</guid>
      <dc:creator>saf_nadia</dc:creator>
      <dc:date>2019-06-18T23:03:17Z</dc:date>
    </item>
    <item>
      <title>Re: Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567090#M75105</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; set have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; randomnumber = rand('uniform');&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;unless you have a pretty large number of records for the same date there's not likely to be much problem with a simple call.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jun 2019 23:26:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567090#M75105</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-06-18T23:26:56Z</dc:date>
    </item>
    <item>
      <title>Re: Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567093#M75106</link>
      <description>I actually have 93 different dates. How to do 93 sets of random number by dates?</description>
      <pubDate>Tue, 18 Jun 2019 23:38:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567093#M75106</guid>
      <dc:creator>saf_nadia</dc:creator>
      <dc:date>2019-06-18T23:38:04Z</dc:date>
    </item>
    <item>
      <title>Re: Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567101#M75107</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/224742"&gt;@saf_nadia&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The rand() function will return a number with 8 byte precision (around 15 digits). The risk that you get the exactly same random number twice is very very small unless you've got a really big data set. For this reason you could just create random numbers without having to care about the date groups.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want to create random numbers with lower precision where you would have the risk of repetition then you would need to maintain a sort of black list to always check if a new generated number has been used already. Below code is a possible way to do something like that.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input Date :ddmmyy. Id $ Entries;
  format date date9.;
  datalines;
1/2/19 A 1
1/2/19 B 2
1/2/19 B 2
2/3/19 C 3
2/3/19 C 3
2/3/19 C 3
2/3/19 A 2
2/3/19 A 2
;

data want(drop=_:);

  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('randomnumber');
      h1.defineDone();
    end;

  set have;
  by date;
  if first.date then h1.clear();

  do while(1);
    randomnumber = round(rand('uniform'),0.001);
    if h1.check() ne 0 then 
      do;
        h1.add();
        leave;
      end;

    /* avoid endless loop */
    _n=sum(_n,1);
    if _n=10000 then 
      do;
        put 'aborting job after 10000 trials to create a new unused random number';
        abort;
      end;
  end;

run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 19 Jun 2019 00:32:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567101#M75107</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-06-19T00:32:50Z</dc:date>
    </item>
    <item>
      <title>Re: Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567111#M75110</link>
      <description>Patrick, ny data has 1 million observations although only 93 different dates. Have tried to generate the random number using simple code by ballard and checked there are around 233 dupout random number.</description>
      <pubDate>Wed, 19 Jun 2019 01:34:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567111#M75110</guid>
      <dc:creator>saf_nadia</dc:creator>
      <dc:date>2019-06-19T01:34:38Z</dc:date>
    </item>
    <item>
      <title>Re: Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567138#M75113</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/224742"&gt;@saf_nadia&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, you are absolutely right and what you observed is actually fully documented.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3 class="xis-title"&gt;&lt;EM&gt;Duplicate Values&lt;/EM&gt;&lt;/H3&gt;
&lt;DIV id="n00n7zzhvlf7e9n1jj0ojhsbh1as" class="xis-topicContent"&gt;
&lt;DIV id="n0oz5aqxfenimyn1ka43l27pugug" class="xis-paragraph"&gt;&lt;EM&gt;The RNG algorithms used by the RAND function have extremely long periods, but this does not imply that large random samples are devoid of duplicate values. With the default 32-bit Mersenne Twister algorithm, the RAND function returns at most 2&lt;SUP class="xis-superscript"&gt;32&lt;/SUP&gt;&amp;nbsp;distinct values. In a random uniform sample of size 10&lt;SUP class="xis-superscript"&gt;5&lt;/SUP&gt;, the chance of drawing at least one duplicate is greater than 50%. The expected number of duplicates in a random uniform sample of size M is approximately M&lt;SUP class="xis-superscript"&gt;2&lt;/SUP&gt;/2&lt;SUP class="xis-superscript"&gt;33&lt;/SUP&gt;&amp;nbsp;when M is much less than&amp;nbsp;&lt;SPAN class="xis-nobr"&gt;2&lt;SUP class="xis-superscript"&gt;32&lt;/SUP&gt;&lt;/SPAN&gt;. For example, you should expect about 115 duplicates in a random uniform sample of size M=10&lt;SUP class="xis-superscript"&gt;6&lt;/SUP&gt;. These results are consequences of the famous “birthday matching problem” in probability theory.&lt;/EM&gt;&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;&lt;EM&gt;&lt;A href="https://go.documentation.sas.com/?docsetId=lefunctionsref&amp;amp;docsetTarget=p0fpeei0opypg8n1b06qe4r040lv.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en#p04b3euoqp9oewn1g8ckgx2ljawl"&gt;https://go.documentation.sas.com/?docsetId=lefunctionsref&amp;amp;docsetTarget=p0fpeei0opypg8n1b06qe4r040lv.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en#p04b3euoqp9oewn1g8ckgx2ljawl&lt;/A&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;The code I've posted previously should work. You just need to allow for more digits (=remove the round() function). Below the amended code.&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input Date :ddmmyy. Id $ Entries;
  format date date9.;
  datalines;
1/2/19 A 1
1/2/19 B 2
1/2/19 B 2
2/3/19 C 3
2/3/19 C 3
2/3/19 C 3
2/3/19 A 2
2/3/19 A 2
;

data want(drop=_:);

  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('randomnumber');
      h1.defineDone();
    end;

  set have;
  by date;
  if first.date then h1.clear();

  do while(1);
    randomnumber = rand('uniform');
    if h1.check() ne 0 then 
      do;
        h1.add();
        leave;
      end;
    /* avoid endless loop */
    _n=sum(_n,1);
    if _n=10000 then 
      do;
        put 'aborting job after 10000 trials to create a new unused random number';
        abort;
      end;
  end;

run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;The ranuni() function seems to work differently and there I don't get duplicates.&lt;/DIV&gt;
&lt;DIV class="xis-paragraph"&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data sample(keep=randomnumber);
  format randomnumber best32.;
  do i=1 to 10**7;
/*    randomnumber = rand('uniform');*/
    randomnumber = ranuni(1);
    output;
  end;
  stop;
run;

proc sort data=sample nodupkey dupout=duplicates;
  by randomnumber;
run;

proc print data=duplicates;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
NOTE: No observations in data set WORK.DUPLICATES.&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Wed, 19 Jun 2019 05:29:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567138#M75113</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-06-19T05:29:22Z</dc:date>
    </item>
    <item>
      <title>Re: Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567313#M75123</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/224742"&gt;@saf_nadia&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Patrick, ny data has 1 million observations although only 93 different dates. Have tried to generate the random number using simple code by ballard and checked there are around 233 dupout random number.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Can you tell us what the purpose of the random number? Perhaps there is another method that would do what you need.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jun 2019 15:09:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567313#M75123</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-06-19T15:09:23Z</dc:date>
    </item>
    <item>
      <title>Re: Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567360#M75124</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/224742"&gt;@saf_nadia&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I agree with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp;- Do you really need random numbers and not just some sort of unique observarion ID? - It happens to all of us that we are stuck in a coding problem and focus on that instead of considering alternative solutions, so we ask for help to climb the downpipe instead of asking for help to find the staircase.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But if you need random numbers, and need them in random order too, you can cheat by making more random numbers than necessary and keeping distinct values, before they are added to the data set. I have an example here:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;* Make some test data - data set HAVE;
data have; 
	do var1 = 1 to 1000000; 
		var2 = mod(var1,10)+1;
		output; 
	end; 
run;

* Get observations from have;
proc sql noprint;
	select count(*) into :obs
	from have;
quit;

* Generate 2*obs random numbers;
data tmp1; 
	do i = 1 to %eval(&amp;amp;obs*2);
		randomnumber = round(rand('uniform')*10000000);
		output;
	end;
run;

* Sort by randomnumber;
proc sort data=tmp1; by randomnumber i;
run;

* Get rid of duplicates;
data tmp2; set tmp1; by randomnumber;
	if first.randomnumber;
run;

* Sort back in random order;
proc sort data=tmp2 out=randomlist (drop=i); by i;
run;

* Add random number to input records;
* Keep input sorting of both data sets;
data want; merge have (in=have_exist) randomlist;
	if not have_exist then stop;
	output;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jun 2019 17:04:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567360#M75124</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2019-06-19T17:04:09Z</dc:date>
    </item>
    <item>
      <title>Re: Assign new random number to data by date</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567569#M75136</link>
      <description>Hi Erik. I guess I just need some unique Id without any duplication. I find this work best to my problem. Thanks Erik!</description>
      <pubDate>Thu, 20 Jun 2019 10:30:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Assign-new-random-number-to-data-by-date/m-p/567569#M75136</guid>
      <dc:creator>saf_nadia</dc:creator>
      <dc:date>2019-06-20T10:30:13Z</dc:date>
    </item>
  </channel>
</rss>

