Hello,
I am trying to create a unique ID variable for each observation in my dataset. However, I want rows that have the same "DOB_name" value to have the same ID. When the "DOB_name" field is missing, I want the unique ID to be different for each of those rows.
An example of my input data and what I want the ID variable to be:
data want; input DOB_name ID datalines; 24APR2007_JOHN 1 07FEB2009_SUSIE 2 30DEC2002_MACIE 3 30DEC2002_MACIE 3 . 4 . 5 . 6 ; run;
Thanks for your help,
Clare
data want;
set have;
by dob_name notsorted;
if missing(dob_name) or first.dobname then id+1;
run;
data want;
set have;
by dob_name notsorted;
if missing(dob_name) or first.dobname then id+1;
run;
Your data WANT step doesn't read the data you provide, so I have created data set HAVE from sashelp.class, with a few recurring NAME values. The DATA WANT step shows how to create a hash object (think lookup table stored in memory). When the h.find() method fails, it returns a non-zero, meaning the hash object does not yet have that name. So it retrieves the number of items already in H, adds 1 to generate an ID for the new record, and stores it in H for later access.
But if the h.find() is successful (returns a zero), then the ID stored in the hash object has been "retrieved" from h and has set the variable ID to the ID value assigned to the name when it first appeared.
I initially forgot about the need to assign unique ID's to each missing value. The code below has been corrected to fulfill that requirement.
Note this does NOT need the data to be sorted by the variable used to generate the ID.
data have;
do until (end_of_class);
set sashelp.class end=end_of_class;
output;
end;
name=' '; output;
do until (end_of_class2);
set sashelp.class end=end_of_class2;
if age=14 then output;
end;
name=' '; output;
run;
data want;
set have;
if _n_=1 then do;
declare hash h ();
h.definekey('name');
h.definedata('id');
h.definedone();
end;
if h.find()^=0 then do;
id=h.num_items+1;
if missing(name) then h.add(key:cats(.,_n_),data:id);
else h.add();
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.