BookmarkSubscribeRSS Feed
sasuser123
Calcite | Level 5

I have a dataset with variables First_Name Last_Name DOB Gender.  I am trying to create a unique ID for rows with identical First_Name Last_Name DOB and  Gender

 

DATA outdata;
INPUT First_Name $ Last_Name $ DOB Gender$;
DATALINES;
TestF TestL 3/3/2010 F
TestF TestL 3/3/2010 F
TestF1 TestL1 3/9/2010 F
TestF1 TestL1 3/9/2010 F
TestF TestL 3/3/2010 F;

run;

 

desired output:

data                                               uniqid

=======                                        =====

TestF TestL 3/3/2010 F                      1
TestF TestL 3/3/2010 F                      1
TestF1 TestL1 3/9/2010 F                  2
TestF1 TestL1 3/9/2010 F                  2
TestF TestL 3/3/2010 F                      1

TestF1 TestL1 3/9/2010 M                  3

 

 

2 REPLIES 2
PaigeMiller
Diamond | Level 26

You can sort so that all records with identical First_Name Last_Name DOB Gender are together, then this can be accomplished as follows:

 

proc sort data=outdata;
    by first_name last_name dob gender;
run;

data want;
    set outdata;
    by first_name last_name dob gender;
    if first.gender then uniqid+1;
run;

 

--
Paige Miller
ballardw
Super User

First, make sure that your example data step runs. No format assigned for DOB means that you have invalid data on each record. Also the semicolon that ends the datalines block must be on a separate line.

Second, make sure that the INPUT data matches the OUTPUT, you did not include that M record in the input;

 

One way:

DATA outdata;
INPUT First_Name $ Last_Name $ DOB :mmddyy10. Gender$;
format dob mmddyy10.;
DATALINES;
TestF TestL 3/3/2010 F
TestF TestL 3/3/2010 F
TestF1 TestL1 3/9/2010 F
TestF1 TestL1 3/9/2010 F
TestF TestL 3/3/2010 F
TestF1 TestL1 3/9/2010 M 
;

run;

Proc sort data=outdata;
   by last_name first_name dob gender;
run;

data want;
   set outdata;
   retain uniqid 0;
   by last_name first_name dob gender;
   if first.gender then uniqid+1;
run;

BY group processing in a data step establishes automatic variables that indicate whether the current value is the first or last of the group. Those are accessed with First.variablename or Last.variablename. The values are 1 or 0 when "true", i.e. the first or the last, and SAS will use those as logical true/false. The Retain establishes a variable that persists across data step boundaries so that once the value is assigned it is kept until changed. BY group processing for this process would want the data to be in order to keep all the identity  variables together, hence the sort.

 

If the output order of the data set is critical, state so but it is relatively trivial to add an order variable(before the sort) and resort the data back to that order.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1581 views
  • 0 likes
  • 3 in conversation