Hello,
I am attempting to create a unique, random ID variable for all existing IDs within dataset y. I have played around with a bunch of code however, I get a few duplicate IDs each time. How do I transform the below code to produce a unique, random number between 1 and 10**8? I have used streaminit(1) to ensure the same random numbers are generated on each run.
Data x;
Set y;
Call streaminit(1);
ID = RAND('Integer',(10 ** 8));
Run;
Thank you in advance,
You could try PROC SURVEYSELECT:
data have;
set sashelp.class;
run;
data id;
do id=1 to 10**8;
output;
end;
run;
%let dsid=%sysfunc(open(have));
%let nobs=%sysfunc(attrn(&dsid.,nlobs));
%let dsid=%sysfunc(close(&dsid.));
proc surveyselect data=id out=id2 seed=123 noprint sampsize=&nobs. outrandom;
run;
data want;
merge have id2;
run;
Here is another way:
Data x;
if _n_=1 then do;
length id 8;
declare hash h(hashexp:20);
h.definekey(key:'id');
h.definedone();
end;
Set sashelp.class;
Call streaminit(1);
do until(rc ne 0);
ID = RAND('Integer',1,10**8);
rc=h.check();
end;
h.add();
drop rc;
Run;
Hello @ALEXIA1,
If your new IDs just need to "look like" random numbers, you can also resort to the (deprecated) RANUNI function, which creates up to 2,147,483,646 unique pseudo-random numbers by default:
data x;
set y;
do until(ID <= 1e8);
ID = ranuni(2718)*(2**31-1);
end;
run;
Is there already an ID variable in the dataset which you want to mask with the new random ID? If yes, @Ksharp 's code needs to be expanded to take care of possible duplicate entries of the already existing ID.
@Kurt_Bremser wrote:
Is there already an ID variable in the dataset which you want to mask with the new random ID? If yes, @Ksharp 's code needs to be expanded to take care of possible duplicate entries of the already existing ID.
Good point from @Kurt_Bremser. In the case of an existing ID in the same range (1, 2, ..., 10**8) the RANUNI approach is particularly simple thanks to the CALL RANUNI routine, which can compute the new ID from the old one:
/* Create sample data for demonstration */
data have;
input ID othervar $;
cards;
12345678 A1
11111111 B1
100000000 C1
1 D1
87654321 E1
12345678 A2
11111111 B2
100000000 C2
1 D2
87654321 E2
;
/* Create unique pseudo-random identifier ID2 between 1 and 10**8 for each existing ID in the same range */
data want;
set have;
ID2=ID;
do until(ID2<=1e8);
call ranuni(ID2, _n_);
end;
run;
Result:
Obs ID othervar ID2 1 12345678 A1 69598248 2 11111111 B1 13830595 3 100000000 C1 60730215 4 1 D1 48774337 5 87654321 E1 2209083 6 12345678 A2 69598248 7 11111111 B2 13830595 8 100000000 C2 60730215 9 1 D2 48774337 10 87654321 E2 2209083
Edit: Moreover, you can compute ("reconstruct") the original ID from the new ID2. There is no SAS-supplied function or CALL routine for that, but the necessary calculations can be done in a DATA step:
/* Reconstruct the original ID from the new ID2 */
%let c=58743242 ; /* multiplicative inverse of the "multiplier" 397204094 found in RANUNI documentation, modulo 2**31-1 */
%let d=%sysevalf(2**31-1); /* =2147483647 */
data check(drop=_:);
/* Preparation: compute 10**k * 58743242 modulo 2**31-1, k=1, ..., 9 */
array _a[9] _temporary_;
if _n_=1 then do;
_a[1]=mod(10*&c, &d);
do _k=2 to 9;
_a[_k]=mod(10*_a[_k-1], &d);
end;
end;
set want;
_m=ID2;
do until(orig_ID<=1e8);
_t=put(_m,10.);
_s=input(char(_t,10), 1.)*&c;
do _k=1 to length(left(_t))-1;
_s+input(char(_t,10-_k), 1.)*_a[_k];
end;
orig_ID=mod(_s,&d);
_m=orig_ID;
end;
run;
Here's a program that leverages the hash approach suggested by @Ksharp . It assumes there is an old ID variable, and also that multiple observations can have the same old ID value, and therefore should be assigned the same randomly determined NEWID value.
It creates dataset WANT with the NEWID (but not the old ID), and a second dataset LOOKUP with all the ID/NEWID pairs:
data want (drop=id); /*But keep NEWID*/
set have end=end_of_have;
if _n_=1 then do;
declare hash LNK (ordered:'A');
LNK.definekey('id');
LNK.definedata('id','newid');
LNK.definedone();
declare hash NEW ();
NEW.definekey('newid');
NEW.definedone();
end;
call streaminit(20240717);
if LNK.find()^=0 then do;
do until (NEW.check()^=0);
newID = RAND('Integer',(10 ** 8));
end;
NEW.add();
LNK.add();
end;
if end_of_have then LNK.output(dataset:'lookup');
run;
Since the LOOKUP dataset is a 1:1 pairing, it can be used to find NEWID for each ID, ... and the reverse.
In the case of very large cardinality of the old ID variable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.