BookmarkSubscribeRSS Feed
sasuser123
Calcite | Level 5

I have a dataset with variables First_Name Last_Name DOB Gender.  I am trying to create a unique ID for rows with identical First_Name Last_Name DOB and  Gender

 

DATA outdata;
INPUT First_Name $ Last_Name $ DOB Gender$;
DATALINES;
TestF TestL 3/3/2010 F
TestF TestL 3/3/2010 F
TestF1 TestL1 3/9/2010 F
TestF1 TestL1 3/9/2010 F
TestF TestL 3/3/2010 F;

run;

 

desired output:

data                                               uniqid

=======                                        =====

TestF TestL 3/3/2010 F                      1
TestF TestL 3/3/2010 F                      1
TestF1 TestL1 3/9/2010 F                  2
TestF1 TestL1 3/9/2010 F                  2
TestF TestL 3/3/2010 F                      1

TestF1 TestL1 3/9/2010 M                  3

 

 

2 REPLIES 2
PaigeMiller
Diamond | Level 26

You can sort so that all records with identical First_Name Last_Name DOB Gender are together, then this can be accomplished as follows:

 

proc sort data=outdata;
    by first_name last_name dob gender;
run;

data want;
    set outdata;
    by first_name last_name dob gender;
    if first.gender then uniqid+1;
run;

 

--
Paige Miller
ballardw
Super User

First, make sure that your example data step runs. No format assigned for DOB means that you have invalid data on each record. Also the semicolon that ends the datalines block must be on a separate line.

Second, make sure that the INPUT data matches the OUTPUT, you did not include that M record in the input;

 

One way:

DATA outdata;
INPUT First_Name $ Last_Name $ DOB :mmddyy10. Gender$;
format dob mmddyy10.;
DATALINES;
TestF TestL 3/3/2010 F
TestF TestL 3/3/2010 F
TestF1 TestL1 3/9/2010 F
TestF1 TestL1 3/9/2010 F
TestF TestL 3/3/2010 F
TestF1 TestL1 3/9/2010 M 
;

run;

Proc sort data=outdata;
   by last_name first_name dob gender;
run;

data want;
   set outdata;
   retain uniqid 0;
   by last_name first_name dob gender;
   if first.gender then uniqid+1;
run;

BY group processing in a data step establishes automatic variables that indicate whether the current value is the first or last of the group. Those are accessed with First.variablename or Last.variablename. The values are 1 or 0 when "true", i.e. the first or the last, and SAS will use those as logical true/false. The Retain establishes a variable that persists across data step boundaries so that once the value is assigned it is kept until changed. BY group processing for this process would want the data to be in order to keep all the identity  variables together, hence the sort.

 

If the output order of the data set is critical, state so but it is relatively trivial to add an order variable(before the sort) and resort the data back to that order.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 2276 views
  • 0 likes
  • 3 in conversation