Solved: Re: building compact unique id using A-Z,0-9

Julie4435637 · Posted 02-24-2015 04:30 PM

Hello,

I want to create a compact unique id value? For example if I have a dataset with a billion rows - I could create a variable named id with a value from 1-1,000,000,000

(0000000001 to 1000000000) . But I don't want my variable to be $10 - I want my variable to be less than 10. I was thinking of using {A-Z, 0-9} to come up with unique id combinations.

Therefore I would have 36 unique characters to build the id variable with. I then figured out that I can create an id value of $6 length (36^6 = 2,176,782,336 unique 6 char combinations). The $5 would not provide at least 1 billion unique values.

I played around with arrays and macro lists but didn't get anywhere. I am stuck on how to start building this new id variable - does anyone have ideas?

Julie

Haikuo · Posted 02-24-2015 09:51 PM

Seems to me all you need is bunch of (in this case, 6) do-loops:

data _null_;

length str $ 200;

do i=48 to 57, 65 to 90;

bt=byte(i);

str=catx(',', str, quote(bt));

end;

call symputx('str', str);

run;

data uniq_id;

length ID $ 6;

do i1=&str.;

do i2=&str.;

do i3=&str.;

do i4=&str.;

do i5=&str.;

do i6=&str.;

id=cats(of i1-i6);

output;

end;

run;

On my 7-years old box, it shows:

NOTE: The data set WORK.UNIQ_ID has 2176782336 observations and 7 variables.

NOTE: DATA statement used (Total process time):

real time 5:18.71

cpu time 4:57.10

View solution in original post

ChrisHemedinger · Posted 02-24-2015 05:38 PM

The UUIDGEN function meets some of your criteria. See this blog post:

http://blogs.sas.com/content/sasdummy/2012/10/19/creating-a-somewhat-unique-id-using-the-uuidgen-fun...

data wastedGuids;
do x=1 to 10; 
  guid = compress(uuidgen(),'-');
  output; 
end; 
run;

/* OUTPUT of 10 records: 

131f5c0e12344f8b9c3a02014663677a
afb7719250bd446b8c7d1285b9f773bb
18f3332aff7448a4bc2e3a41eb1653f8
223ce6024cd74038bfc9a5d2523524d1
efb82bb00d374181b36111e4ad7150db
9c412af5fb7140d4995ff25aec13969c
8dfd6b1b7d3045d887d7481a3faa4548
125250a10f814aa2b72ac5f727049e0b
5bd93cc573524a6589bead495f301f3d
7f3c6f4517f141e381d4f2c4a0c4bbb3

*/

The downside is that the length will be $32 -- that's longer than you asked for. But the result is guaranteed unique and you've saved your own brainpower.

Another option is to use a HEX8. format, more concise than Decimal:

data last10_IDs;
do x=999999990 to 1000000000; 
  id = put (x,hex8.);
  output; 
end; 
run;

       x           id
   999999990    3B9AC9F6
   999999991    3B9AC9F7
   999999992    3B9AC9F8
   999999993    3B9AC9F9
   999999994    3B9AC9FA
   999999995    3B9AC9FB
   999999996    3B9AC9FC
   999999997    3B9AC9FD
   999999998    3B9AC9FE
   999999999    3B9AC9FF
  1000000000    3B9ACA00

Chris

Become an Explorer! Join SAS Analytics Explorers to learn and complete challenges that earn rewards!

Julie4435637 · Posted 02-25-2015 09:29 AM

Thank you. This really helpful to know - I have never come across the UUIDGEN function before.

Julie

Haikuo · Posted 02-24-2015 09:51 PM

Seems to me all you need is bunch of (in this case, 6) do-loops:

data _null_;

length str $ 200;

do i=48 to 57, 65 to 90;

bt=byte(i);

str=catx(',', str, quote(bt));

end;

call symputx('str', str);

run;

data uniq_id;

length ID $ 6;

do i1=&str.;

do i2=&str.;

do i3=&str.;

do i4=&str.;

do i5=&str.;

do i6=&str.;

id=cats(of i1-i6);

output;

end;

run;

On my 7-years old box, it shows:

NOTE: The data set WORK.UNIQ_ID has 2176782336 observations and 7 variables.

NOTE: DATA statement used (Total process time):

real time 5:18.71

cpu time 4:57.10

Julie4435637 · Posted 02-25-2015 09:32 AM

You are a SAS Rock Star - This is exactly what I wanted.

Now that I see the code - it is perfectly clear how to do this.

It took over 8 minutes on my 2 month old dell.

Thanks, Julie

Haikuo · Posted 02-25-2015 12:13 PM

Glad it worked out for ya. And don't feel bad about your Dell, which I assume it is for office use. So it may not be fair comparison. My box was a gaming setup, it has 4 core cpu and 8G ram, that's a pretty big deal back to 7 years ago when mainstream market started promoting 2xcore CPU and 2G RAM machines.

building compact unique id using A-Z,0-9

Re: building compact unique id using A-Z,0-9

Re: building compact unique id using A-Z,0-9

Re: building compact unique id using A-Z,0-9

Re: building compact unique id using A-Z,0-9

Re: building compact unique id using A-Z,0-9

Re: building compact unique id using A-Z,0-9

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away