BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Julie4435637
Obsidian | Level 7

Hello,

I want to create a compact unique id value?  For example if  I have a dataset with a billion rows - I could create a variable named id with a value from 1-1,000,000,000

(0000000001 to 1000000000) .  But I don't want my variable to be $10 - I want my variable to be less than 10.  I was thinking of using {A-Z, 0-9} to come up with unique id combinations.

Therefore I would have 36 unique characters to build the id variable with. I then figured out that I  can create an id value of $6 length (36^6 = 2,176,782,336 unique 6 char combinations). The $5 would not provide at least 1 billion unique values.

I played around with arrays and macro lists but didn't get anywhere.  I am stuck on how to start building this new id variable - does anyone have ideas?

Julie

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Seems to me all you need is bunch of (in this case, 6) do-loops:

data _null_;

     length str $ 200;

     do i=48 to 57, 65 to 90;

           bt=byte(i);

           str=catx(',', str, quote(bt));

     end;

     call symputx('str', str);

run;

data uniq_id;

     length ID  $ 6;

     do i1=&str.;

           do i2=&str.;

                do i3=&str.;

                     do i4=&str.;

                           do i5=&str.;

                                do i6=&str.;

                                     id=cats(of i1-i6);

                                     output;

                                end;

                           end;

                     end;

                end;

           end;

     end;

run;


On my 7-years old box, it shows:

NOTE: The data set WORK.UNIQ_ID has 2176782336 observations and 7 variables.

NOTE: DATA statement used (Total process time):

      real time           5:18.71

      cpu time            4:57.10



View solution in original post

5 REPLIES 5
ChrisHemedinger
Community Manager

The UUIDGEN function meets some of your criteria.  See this blog post:

http://blogs.sas.com/content/sasdummy/2012/10/19/creating-a-somewhat-unique-id-using-the-uuidgen-fun...

data wastedGuids;
do x=1 to 10;
  guid = compress(uuidgen(),
'-');
  output;
end;
run;

/* OUTPUT of 10 records:

131f5c0e12344f8b9c3a02014663677a

afb7719250bd446b8c7d1285b9f773bb

18f3332aff7448a4bc2e3a41eb1653f8

223ce6024cd74038bfc9a5d2523524d1

efb82bb00d374181b36111e4ad7150db

9c412af5fb7140d4995ff25aec13969c

8dfd6b1b7d3045d887d7481a3faa4548

125250a10f814aa2b72ac5f727049e0b

5bd93cc573524a6589bead495f301f3d

7f3c6f4517f141e381d4f2c4a0c4bbb3

*/

The downside is that the length will be $32 -- that's longer than you asked for.  But the result is guaranteed unique and you've saved your own brainpower.

Another option is to use a HEX8. format, more concise than Decimal:

data last10_IDs;
do x=999999990 to 1000000000;
  id = put (x,
hex8.);
  output;
end;
run;

       x           id

   999999990    3B9AC9F6

   999999991    3B9AC9F7

   999999992    3B9AC9F8

   999999993    3B9AC9F9

   999999994    3B9AC9FA

   999999995    3B9AC9FB

   999999996    3B9AC9FC

   999999997    3B9AC9FD

   999999998    3B9AC9FE

   999999999    3B9AC9FF

  1000000000    3B9ACA00

Chris

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
Julie4435637
Obsidian | Level 7

Thank you.  This really helpful to know - I have never come across the UUIDGEN function before.

Julie

Haikuo
Onyx | Level 15

Seems to me all you need is bunch of (in this case, 6) do-loops:

data _null_;

     length str $ 200;

     do i=48 to 57, 65 to 90;

           bt=byte(i);

           str=catx(',', str, quote(bt));

     end;

     call symputx('str', str);

run;

data uniq_id;

     length ID  $ 6;

     do i1=&str.;

           do i2=&str.;

                do i3=&str.;

                     do i4=&str.;

                           do i5=&str.;

                                do i6=&str.;

                                     id=cats(of i1-i6);

                                     output;

                                end;

                           end;

                     end;

                end;

           end;

     end;

run;


On my 7-years old box, it shows:

NOTE: The data set WORK.UNIQ_ID has 2176782336 observations and 7 variables.

NOTE: DATA statement used (Total process time):

      real time           5:18.71

      cpu time            4:57.10



Julie4435637
Obsidian | Level 7

You are a SAS Rock Star - This is exactly what I wanted.

Now that I see the code - it is perfectly clear how to do this.

It took over 8 minutes on my 2 month old dell.

Thanks,  Julie

Haikuo
Onyx | Level 15

Glad it worked out for ya. And don't feel bad about your Dell, which I assume it is for office use. So it may not be fair comparison. My box was a gaming setup, it has 4 core cpu and 8G ram, that's a pretty big deal back to 7 years ago when mainstream market started promoting 2xcore CPU and 2G RAM machines.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1297 views
  • 5 likes
  • 3 in conversation