Hello,
I want to create a compact unique id value? For example if I have a dataset with a billion rows - I could create a variable named id with a value from 1-1,000,000,000
(0000000001 to 1000000000) . But I don't want my variable to be $10 - I want my variable to be less than 10. I was thinking of using {A-Z, 0-9} to come up with unique id combinations.
Therefore I would have 36 unique characters to build the id variable with. I then figured out that I can create an id value of $6 length (36^6 = 2,176,782,336 unique 6 char combinations). The $5 would not provide at least 1 billion unique values.
I played around with arrays and macro lists but didn't get anywhere. I am stuck on how to start building this new id variable - does anyone have ideas?
Julie
Seems to me all you need is bunch of (in this case, 6) do-loops:
data _null_;
length str $ 200;
do i=48 to 57, 65 to 90;
bt=byte(i);
str=catx(',', str, quote(bt));
end;
call symputx('str', str);
run;
data uniq_id;
length ID $ 6;
do i1=&str.;
do i2=&str.;
do i3=&str.;
do i4=&str.;
do i5=&str.;
do i6=&str.;
id=cats(of i1-i6);
output;
end;
end;
end;
end;
end;
end;
run;
On my 7-years old box, it shows:
NOTE: The data set WORK.UNIQ_ID has 2176782336 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 5:18.71
cpu time 4:57.10
The UUIDGEN function meets some of your criteria. See this blog post:
131f5c0e12344f8b9c3a02014663677a
afb7719250bd446b8c7d1285b9f773bb
18f3332aff7448a4bc2e3a41eb1653f8
223ce6024cd74038bfc9a5d2523524d1
efb82bb00d374181b36111e4ad7150db
9c412af5fb7140d4995ff25aec13969c
8dfd6b1b7d3045d887d7481a3faa4548
125250a10f814aa2b72ac5f727049e0b
5bd93cc573524a6589bead495f301f3d
7f3c6f4517f141e381d4f2c4a0c4bbb3
The downside is that the length will be $32 -- that's longer than you asked for. But the result is guaranteed unique and you've saved your own brainpower.
Another option is to use a HEX8. format, more concise than Decimal:
x id
999999990 3B9AC9F6
999999991 3B9AC9F7
999999992 3B9AC9F8
999999993 3B9AC9F9
999999994 3B9AC9FA
999999995 3B9AC9FB
999999996 3B9AC9FC
999999997 3B9AC9FD
999999998 3B9AC9FE
999999999 3B9AC9FF
1000000000 3B9ACA00
Chris
Thank you. This really helpful to know - I have never come across the UUIDGEN function before.
Julie
Seems to me all you need is bunch of (in this case, 6) do-loops:
data _null_;
length str $ 200;
do i=48 to 57, 65 to 90;
bt=byte(i);
str=catx(',', str, quote(bt));
end;
call symputx('str', str);
run;
data uniq_id;
length ID $ 6;
do i1=&str.;
do i2=&str.;
do i3=&str.;
do i4=&str.;
do i5=&str.;
do i6=&str.;
id=cats(of i1-i6);
output;
end;
end;
end;
end;
end;
end;
run;
On my 7-years old box, it shows:
NOTE: The data set WORK.UNIQ_ID has 2176782336 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 5:18.71
cpu time 4:57.10
You are a SAS Rock Star - This is exactly what I wanted.
Now that I see the code - it is perfectly clear how to do this.
It took over 8 minutes on my 2 month old dell.
Thanks, Julie
Glad it worked out for ya. And don't feel bad about your Dell, which I assume it is for office use. So it may not be fair comparison. My box was a gaming setup, it has 4 core cpu and 8G ram, that's a pretty big deal back to 7 years ago when mainstream market started promoting 2xcore CPU and 2G RAM machines.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.