BookmarkSubscribeRSS Feed

Creating a unique ID with CAS DATA Step

Started ‎05-01-2020 by
Modified ‎05-01-2020 by
Views 16,878

CAS is massively parallel, spreading its processing across multiple threads on multiple machines but this affects the way you need to write your CAS DATA Step code. Let's look at how we might choose to create a unique row identifier in CAS versus SAS.

.

SAS DATA Step

Creating a unique ID in SAS DATA Step is quite simple. We just use the _n_ automatic variable as shown below:

.

DATA tableWithUniqueID;
SET tableWithOutUniqueID; 

        uniqueID = _n_;

run;

.

CAS DATA Step

Creating a unique ID in CAS DATA Step is more complicated. Each thread maintains its own _n_. So if we just use _n_, we'll get duplicate IDs. Each thread will produce an uniqueID field value of 1. Each thread will produce an uniqueID field value of 2. And so on.... When the thread output is combined, we'll have a bunch of records with a uniqueID of 1 and a bunch with a uniqueID of 2.... This is not useful. To produce a truly unique ID, you need to augment _n_ with something else. CAS has its own set of automatic variables for such purposes. The _threadID_ automatic variable can help us get our unique ID as shown below:

.

DATA tableWithUniqueID;
SET tableWithOutUniqueID;

        uniqueID = put(_threadid_,8.) || '_' || Put(_n_,8.);

run;

While there are surely other ways of doing it, concatenating _threadID_ with the _n_ ensures uniqueness because the _threadID_ uniquely identifies a single thread and _n_ uniquely identifies a single row output by that thread.

 

For more information on threads and CAS DATA Step, see this blog post.

Comments

Very useful for CAS, thank you.

If you want the uniqueID to be a number, instead of a varchar, you can create it using a Cantor Function π(a,b)=1/2(a+b)(a+b+1)+b ;

The Cantor Function ensures a unique combination is obtained from two integers.

 

uniqueID= 1/2 * (put(_threadid_,8.) + Put(_n_,8.)) 
* ((put(_threadid_,8.) + Put(_n_,8.)) + 1 ) + Put(_n_,8.);

 

Version history
Last update:
‎05-01-2020 03:46 PM
Updated by:
Contributors

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags