Hello,
I want to create a compact unique id value? For example if I have a dataset with a billion rows - I could create a variable named id with a value from 1-1,000,000,000
(0000000001 to 1000000000) . But I don't want my variable to be $10 - I want my variable to be less than 10. I was thinking of using {A-Z, 0-9} to come up with unique id combinations.
Therefore I would have 36 unique characters to build the id variable with. I then figured out that I can create an id value of $6 length (36^6 = 2,176,782,336 unique 6 char combinations). The $5 would not provide at least 1 billion unique values.
I played around with arrays and macro lists but didn't get anywhere. I am stuck on how to start building this new id variable - does anyone have ideas?
Julie
Seems to me all you need is bunch of (in this case, 6) do-loops:
data _null_;
length str $ 200;
do i=48 to 57, 65 to 90;
bt=byte(i);
str=catx(',', str, quote(bt));
end;
call symputx('str', str);
run;
data uniq_id;
length ID $ 6;
do i1=&str.;
do i2=&str.;
do i3=&str.;
do i4=&str.;
do i5=&str.;
do i6=&str.;
id=cats(of i1-i6);
output;
end;
end;
end;
end;
end;
end;
run;
On my 7-years old box, it shows:
NOTE: The data set WORK.UNIQ_ID has 2176782336 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 5:18.71
cpu time 4:57.10
The UUIDGEN function meets some of your criteria. See this blog post:
131f5c0e12344f8b9c3a02014663677a
afb7719250bd446b8c7d1285b9f773bb
18f3332aff7448a4bc2e3a41eb1653f8
223ce6024cd74038bfc9a5d2523524d1
efb82bb00d374181b36111e4ad7150db
9c412af5fb7140d4995ff25aec13969c
8dfd6b1b7d3045d887d7481a3faa4548
125250a10f814aa2b72ac5f727049e0b
5bd93cc573524a6589bead495f301f3d
7f3c6f4517f141e381d4f2c4a0c4bbb3
The downside is that the length will be $32 -- that's longer than you asked for. But the result is guaranteed unique and you've saved your own brainpower.
Another option is to use a HEX8. format, more concise than Decimal:
x id
999999990 3B9AC9F6
999999991 3B9AC9F7
999999992 3B9AC9F8
999999993 3B9AC9F9
999999994 3B9AC9FA
999999995 3B9AC9FB
999999996 3B9AC9FC
999999997 3B9AC9FD
999999998 3B9AC9FE
999999999 3B9AC9FF
1000000000 3B9ACA00
Chris
Thank you. This really helpful to know - I have never come across the UUIDGEN function before.
Julie
Seems to me all you need is bunch of (in this case, 6) do-loops:
data _null_;
length str $ 200;
do i=48 to 57, 65 to 90;
bt=byte(i);
str=catx(',', str, quote(bt));
end;
call symputx('str', str);
run;
data uniq_id;
length ID $ 6;
do i1=&str.;
do i2=&str.;
do i3=&str.;
do i4=&str.;
do i5=&str.;
do i6=&str.;
id=cats(of i1-i6);
output;
end;
end;
end;
end;
end;
end;
run;
On my 7-years old box, it shows:
NOTE: The data set WORK.UNIQ_ID has 2176782336 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 5:18.71
cpu time 4:57.10
You are a SAS Rock Star - This is exactly what I wanted.
Now that I see the code - it is perfectly clear how to do this.
It took over 8 minutes on my 2 month old dell.
Thanks, Julie
Glad it worked out for ya. And don't feel bad about your Dell, which I assume it is for office use. So it may not be fair comparison. My box was a gaming setup, it has 4 core cpu and 8G ram, that's a pretty big deal back to 7 years ago when mainstream market started promoting 2xcore CPU and 2G RAM machines.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.