BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
claremc
Obsidian | Level 7

Hello, 

 

I am trying to create a unique ID variable for each observation in my dataset. However, I want rows that have the same "DOB_name" value to have the same ID. When the "DOB_name" field is missing, I want the unique ID to be different for each of those rows. 

An example of my input data and what I want the ID variable to be: 

data want;
input  DOB_name ID
datalines;
24APR2007_JOHN 1 
07FEB2009_SUSIE 2
30DEC2002_MACIE 3
30DEC2002_MACIE 3
. 4
. 5
. 6
;
run;

Thanks for your help, 

Clare

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data want;
 set have;
by dob_name notsorted;
 if missing(dob_name) or first.dobname then id+1;
run;

View solution in original post

3 REPLIES 3
novinosrin
Tourmaline | Level 20
data want;
 set have;
by dob_name notsorted;
 if missing(dob_name) or first.dobname then id+1;
run;
mkeintz
PROC Star

Your data WANT step doesn't read the data you provide, so I have created data set HAVE from sashelp.class, with a few recurring NAME values.  The DATA WANT step shows how to create a hash object (think lookup table stored in memory).  When the h.find() method fails, it returns a non-zero, meaning the hash object does not yet have that name. So it retrieves the number of items already in H, adds 1 to generate an ID for the new record, and stores it in H for later access.

 

But if the h.find() is successful (returns a zero), then the ID stored in the hash object has been "retrieved" from h and has set the variable ID to the ID value assigned to the name when it first appeared.

 

I initially forgot about the need to assign unique ID's to each missing value.  The code below has been corrected to fulfill that requirement.

 

Note this does NOT need the data to be sorted by the variable used to generate the ID.

 


data have; do until (end_of_class); set sashelp.class end=end_of_class; output; end; name=' '; output; do until (end_of_class2); set sashelp.class end=end_of_class2; if age=14 then output; end; name=' '; output; run; data want; set have; if _n_=1 then do; declare hash h (); h.definekey('name'); h.definedata('id'); h.definedone(); end; if h.find()^=0 then do; id=h.num_items+1; if missing(name) then h.add(key:cats(.,_n_),data:id); else h.add(); end; run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1086 views
  • 1 like
  • 3 in conversation