BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MHines
Obsidian | Level 7

Hello,

 

I am analyzing a dataset with multiple duplicate ID numbers in it and am trying to create a flag such that, if an ID appears four times for example, the first appearance is flagged 0, second is flagged 1, third is flagged 2, and fourth is flagged 3. I have approximately 139,000 unique IDs and 356,000 rows total. About 40% of my IDs are unique, but the number of observations per ID ranges from 1 to 98 (I only really care about the first three observations per ID, but I need to be able to differentiate between the second and third). I successfully created a flag that identifies the duplicates, but it does not differentiate between the second and all duplicates following it (it's just a 0 /1 binary indicator currently, where =1 tells me the row is a duplicate). Here is my current code:

 

data want;

set data have;

by idnumber;

if first.idnumber=0 or last.idnumber=0 then dupflag = 1;

else flag=0;

run;

 

Any assistance is appreciated. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You can use the SUM statement, which has an implicit retain. The retain forces SAS to keep the values across rows and continuously add to it.

 

FYI - I'm assuming your SET statement is incorrect and DATA isn't required.

Full details are available here if you're not sure how this works:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

data want;

set have;

by idnumber;

*not required as sum does it implicitly;
retain counter;

if first.idnumber then counter=0;
else counter+1;

run;

@MHines wrote:

Hello,

 

I am analyzing a dataset with multiple duplicate ID numbers in it and am trying to create a flag such that, if an ID appears four times for example, the first appearance is flagged 0, second is flagged 1, third is flagged 2, and fourth is flagged 3. I have approximately 139,000 unique IDs and 356,000 rows total. About 40% of my IDs are unique, but the number of observations per ID ranges from 1 to 98 (I only really care about the first three observations per ID, but I need to be able to differentiate between the second and third). I successfully created a flag that identifies the duplicates, but it does not differentiate between the second and all duplicates following it (it's just a 0 /1 binary indicator currently, where =1 tells me the row is a duplicate). Here is my current code:

 

data want;

set data have;

by idnumber;

if first.idnumber=0 or last.idnumber=0 then dupflag = 1;

else flag=0;

run;

 

Any assistance is appreciated. Thank you!


 

 

View solution in original post

3 REPLIES 3
Reeza
Super User

You can use the SUM statement, which has an implicit retain. The retain forces SAS to keep the values across rows and continuously add to it.

 

FYI - I'm assuming your SET statement is incorrect and DATA isn't required.

Full details are available here if you're not sure how this works:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

data want;

set have;

by idnumber;

*not required as sum does it implicitly;
retain counter;

if first.idnumber then counter=0;
else counter+1;

run;

@MHines wrote:

Hello,

 

I am analyzing a dataset with multiple duplicate ID numbers in it and am trying to create a flag such that, if an ID appears four times for example, the first appearance is flagged 0, second is flagged 1, third is flagged 2, and fourth is flagged 3. I have approximately 139,000 unique IDs and 356,000 rows total. About 40% of my IDs are unique, but the number of observations per ID ranges from 1 to 98 (I only really care about the first three observations per ID, but I need to be able to differentiate between the second and third). I successfully created a flag that identifies the duplicates, but it does not differentiate between the second and all duplicates following it (it's just a 0 /1 binary indicator currently, where =1 tells me the row is a duplicate). Here is my current code:

 

data want;

set data have;

by idnumber;

if first.idnumber=0 or last.idnumber=0 then dupflag = 1;

else flag=0;

run;

 

Any assistance is appreciated. Thank you!


 

 

MHines
Obsidian | Level 7

Thank you, this is exactly what I needed! And yes you are correct -- that was a typo in my set statement.

data_null__
Jade | Level 19

Why not just index all starting at 0.  If you don't care about anything past FLAG=3 it won't matter what value the flag has.

 


@MHines wrote:

Hello,

 

I am analyzing a dataset with multiple duplicate ID numbers in it and am trying to create a flag such that, if an ID appears four times for example, the first appearance is flagged 0, second is flagged 1, third is flagged 2, and fourth is flagged 3. I have approximately 139,000 unique IDs and 356,000 rows total. About 40% of my IDs are unique, but the number of observations per ID ranges from 1 to 98 (I only really care about the first three observations per ID, but I need to be able to differentiate between the second and third). I successfully created a flag that identifies the duplicates, but it does not differentiate between the second and all duplicates following it (it's just a 0 /1 binary indicator currently, where =1 tells me the row is a duplicate). Here is my current code:

 

data want;

set data have;

by idnumber;

if first.idnumber=0 or last.idnumber=0 then dupflag = 1;

else flag=0;

run;

 

Any assistance is appreciated. Thank you!


 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 920 views
  • 0 likes
  • 3 in conversation