I have a dataset with variables First_Name Last_Name DOB Gender. I am trying to create a unique ID for rows with identical First_Name Last_Name DOB and Gender
DATA outdata;
INPUT First_Name $ Last_Name $ DOB Gender$;
DATALINES;
TestF TestL 3/3/2010 F
TestF TestL 3/3/2010 F
TestF1 TestL1 3/9/2010 F
TestF1 TestL1 3/9/2010 F
TestF TestL 3/3/2010 F;
run;
desired output:
data uniqid
======= =====
TestF TestL 3/3/2010 F 1
TestF TestL 3/3/2010 F 1
TestF1 TestL1 3/9/2010 F 2
TestF1 TestL1 3/9/2010 F 2
TestF TestL 3/3/2010 F 1
TestF1 TestL1 3/9/2010 M 3
You can sort so that all records with identical First_Name Last_Name DOB Gender are together, then this can be accomplished as follows:
proc sort data=outdata;
by first_name last_name dob gender;
run;
data want;
set outdata;
by first_name last_name dob gender;
if first.gender then uniqid+1;
run;
First, make sure that your example data step runs. No format assigned for DOB means that you have invalid data on each record. Also the semicolon that ends the datalines block must be on a separate line.
Second, make sure that the INPUT data matches the OUTPUT, you did not include that M record in the input;
One way:
DATA outdata; INPUT First_Name $ Last_Name $ DOB :mmddyy10. Gender$; format dob mmddyy10.; DATALINES; TestF TestL 3/3/2010 F TestF TestL 3/3/2010 F TestF1 TestL1 3/9/2010 F TestF1 TestL1 3/9/2010 F TestF TestL 3/3/2010 F TestF1 TestL1 3/9/2010 M ; run; Proc sort data=outdata; by last_name first_name dob gender; run; data want; set outdata; retain uniqid 0; by last_name first_name dob gender; if first.gender then uniqid+1; run;
BY group processing in a data step establishes automatic variables that indicate whether the current value is the first or last of the group. Those are accessed with First.variablename or Last.variablename. The values are 1 or 0 when "true", i.e. the first or the last, and SAS will use those as logical true/false. The Retain establishes a variable that persists across data step boundaries so that once the value is assigned it is kept until changed. BY group processing for this process would want the data to be in order to keep all the identity variables together, hence the sort.
If the output order of the data set is critical, state so but it is relatively trivial to add an order variable(before the sort) and resort the data back to that order.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.