BookmarkSubscribeRSS Feed
ALEXIA1
Calcite | Level 5

Hello, 

I am attempting to create a unique, random ID variable for all existing IDs within dataset y. I have played around with a bunch of code however, I get a few duplicate IDs each time. How do I transform the below code to produce a unique, random number between 1 and 10**8? I have used streaminit(1) to ensure the same random numbers are generated on each run. 

 

Data x;

Set y;

Call streaminit(1);
ID = RAND('Integer',(10 ** 8));

Run; 

 

Thank you in advance, 

6 REPLIES 6
Ksharp
Super User

You could try PROC SURVEYSELECT:

 

data have;
 set sashelp.class;
run;




data id;
do id=1 to 10**8;
 output;
end;
run;
%let dsid=%sysfunc(open(have));
%let nobs=%sysfunc(attrn(&dsid.,nlobs));
%let dsid=%sysfunc(close(&dsid.));
proc surveyselect data=id out=id2 seed=123 noprint sampsize=&nobs. outrandom;
run;
data want;
 merge have id2;
run;
Ksharp
Super User

Here is another way:

 

Data x;
if _n_=1 then do;
 length id 8;
 declare hash h(hashexp:20);
 h.definekey(key:'id');
 h.definedone();
end;
Set sashelp.class;

Call streaminit(1);
do until(rc ne 0);
  ID = RAND('Integer',1,10**8);
  rc=h.check();
end;
h.add();
drop rc;
Run; 
FreelanceReinh
Jade | Level 19

Hello @ALEXIA1,

 

If your new IDs just need to "look like" random numbers, you can also resort to the (deprecated) RANUNI function, which creates up to 2,147,483,646 unique pseudo-random  numbers by default:

data x;
set y;
do until(ID <= 1e8);
  ID = ranuni(2718)*(2**31-1);
end;
run;
Kurt_Bremser
Super User

Is there already an ID variable in the dataset which you want to mask with the new random ID? If yes, @Ksharp 's code needs to be expanded to take care of possible duplicate entries of the already existing ID.

FreelanceReinh
Jade | Level 19

@Kurt_Bremser wrote:

Is there already an ID variable in the dataset which you want to mask with the new random ID? If yes, @Ksharp 's code needs to be expanded to take care of possible duplicate entries of the already existing ID.


Good point from @Kurt_Bremser. In the case of an existing ID in the same range (1, 2, ..., 10**8) the RANUNI approach is particularly simple thanks to the CALL RANUNI routine, which can compute the new ID from the old one:

/* Create sample data for demonstration */

data have;
input ID othervar $;
cards;
 12345678 A1
 11111111 B1
100000000 C1
        1 D1
 87654321 E1
 12345678 A2
 11111111 B2
100000000 C2
        1 D2
 87654321 E2
;

/* Create unique pseudo-random identifier ID2 between 1 and 10**8 for each existing ID in the same range */

data want;
set have;
ID2=ID;
do until(ID2<=1e8);
  call ranuni(ID2, _n_);
end;
run;

Result:

Obs           ID    othervar       ID2

  1     12345678       A1       69598248
  2     11111111       B1       13830595
  3    100000000       C1       60730215
  4            1       D1       48774337
  5     87654321       E1        2209083
  6     12345678       A2       69598248
  7     11111111       B2       13830595
  8    100000000       C2       60730215
  9            1       D2       48774337
 10     87654321       E2        2209083

 

Edit: Moreover, you can compute ("reconstruct") the original ID from the new ID2. There is no SAS-supplied function or CALL routine for that, but the necessary calculations can be done in a DATA step:

/* Reconstruct the original ID from the new ID2 */

%let c=58743242 ; /* multiplicative inverse of the "multiplier" 397204094 found in RANUNI documentation, modulo 2**31-1 */
%let d=%sysevalf(2**31-1); /* =2147483647 */

data check(drop=_:);
/* Preparation: compute 10**k * 58743242 modulo 2**31-1, k=1, ..., 9 */
array _a[9] _temporary_;
if _n_=1 then do;
  _a[1]=mod(10*&c, &d);
  do _k=2 to 9;
    _a[_k]=mod(10*_a[_k-1], &d);
  end;
end;
set want;
_m=ID2;
do until(orig_ID<=1e8);
  _t=put(_m,10.);
  _s=input(char(_t,10), 1.)*&c;
  do _k=1 to length(left(_t))-1;
    _s+input(char(_t,10-_k), 1.)*_a[_k];
  end;
  orig_ID=mod(_s,&d);
  _m=orig_ID;
end;
run;

 

mkeintz
PROC Star

Here's a program that leverages the hash approach suggested by @Ksharp .  It assumes there is an old ID variable, and also that multiple observations can have the same old ID value, and therefore should be assigned the same randomly determined NEWID value.

 

It creates dataset WANT with the NEWID (but not the old ID), and a second dataset LOOKUP with all the ID/NEWID pairs:

 

data want (drop=id);  /*But keep NEWID*/

  set have  end=end_of_have;

  if _n_=1 then do;
    declare hash LNK (ordered:'A');
      LNK.definekey('id');
      LNK.definedata('id','newid');
      LNK.definedone();
    declare hash NEW ();
      NEW.definekey('newid');
      NEW.definedone();
  end;

  call streaminit(20240717);
  if LNK.find()^=0 then do;
    do until (NEW.check()^=0);
      newID = RAND('Integer',(10 ** 8));
    end;
    NEW.add();
    LNK.add();
  end;

  if end_of_have then LNK.output(dataset:'lookup');
run;

Since the LOOKUP dataset is a 1:1 pairing, it can be used to find NEWID for each ID, ... and the reverse.

 

In the case of very large cardinality of the old ID variable.

  1. This also assumes that you have no more than 10**8 unique old ID values.
  2. The closer the cardinality of your old ID gets to 10**8, the slower this process will be.
  3. And the greater risk there will be that the hash object will require more memory than available.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 486 views
  • 1 like
  • 5 in conversation