BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MHines
Obsidian | Level 7

Hello,

 

I am analyzing a dataset with multiple duplicate ID numbers in it and am trying to create a flag such that, if an ID appears four times for example, the first appearance is flagged 0, second is flagged 1, third is flagged 2, and fourth is flagged 3. I have approximately 139,000 unique IDs and 356,000 rows total. About 40% of my IDs are unique, but the number of observations per ID ranges from 1 to 98 (I only really care about the first three observations per ID, but I need to be able to differentiate between the second and third). I successfully created a flag that identifies the duplicates, but it does not differentiate between the second and all duplicates following it (it's just a 0 /1 binary indicator currently, where =1 tells me the row is a duplicate). Here is my current code:

 

data want;

set data have;

by idnumber;

if first.idnumber=0 or last.idnumber=0 then dupflag = 1;

else flag=0;

run;

 

Any assistance is appreciated. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You can use the SUM statement, which has an implicit retain. The retain forces SAS to keep the values across rows and continuously add to it.

 

FYI - I'm assuming your SET statement is incorrect and DATA isn't required.

Full details are available here if you're not sure how this works:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

data want;

set have;

by idnumber;

*not required as sum does it implicitly;
retain counter;

if first.idnumber then counter=0;
else counter+1;

run;

@MHines wrote:

Hello,

 

I am analyzing a dataset with multiple duplicate ID numbers in it and am trying to create a flag such that, if an ID appears four times for example, the first appearance is flagged 0, second is flagged 1, third is flagged 2, and fourth is flagged 3. I have approximately 139,000 unique IDs and 356,000 rows total. About 40% of my IDs are unique, but the number of observations per ID ranges from 1 to 98 (I only really care about the first three observations per ID, but I need to be able to differentiate between the second and third). I successfully created a flag that identifies the duplicates, but it does not differentiate between the second and all duplicates following it (it's just a 0 /1 binary indicator currently, where =1 tells me the row is a duplicate). Here is my current code:

 

data want;

set data have;

by idnumber;

if first.idnumber=0 or last.idnumber=0 then dupflag = 1;

else flag=0;

run;

 

Any assistance is appreciated. Thank you!


 

 

View solution in original post

3 REPLIES 3
Reeza
Super User

You can use the SUM statement, which has an implicit retain. The retain forces SAS to keep the values across rows and continuously add to it.

 

FYI - I'm assuming your SET statement is incorrect and DATA isn't required.

Full details are available here if you're not sure how this works:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

data want;

set have;

by idnumber;

*not required as sum does it implicitly;
retain counter;

if first.idnumber then counter=0;
else counter+1;

run;

@MHines wrote:

Hello,

 

I am analyzing a dataset with multiple duplicate ID numbers in it and am trying to create a flag such that, if an ID appears four times for example, the first appearance is flagged 0, second is flagged 1, third is flagged 2, and fourth is flagged 3. I have approximately 139,000 unique IDs and 356,000 rows total. About 40% of my IDs are unique, but the number of observations per ID ranges from 1 to 98 (I only really care about the first three observations per ID, but I need to be able to differentiate between the second and third). I successfully created a flag that identifies the duplicates, but it does not differentiate between the second and all duplicates following it (it's just a 0 /1 binary indicator currently, where =1 tells me the row is a duplicate). Here is my current code:

 

data want;

set data have;

by idnumber;

if first.idnumber=0 or last.idnumber=0 then dupflag = 1;

else flag=0;

run;

 

Any assistance is appreciated. Thank you!


 

 

MHines
Obsidian | Level 7

Thank you, this is exactly what I needed! And yes you are correct -- that was a typo in my set statement.

data_null__
Jade | Level 19

Why not just index all starting at 0.  If you don't care about anything past FLAG=3 it won't matter what value the flag has.

 


@MHines wrote:

Hello,

 

I am analyzing a dataset with multiple duplicate ID numbers in it and am trying to create a flag such that, if an ID appears four times for example, the first appearance is flagged 0, second is flagged 1, third is flagged 2, and fourth is flagged 3. I have approximately 139,000 unique IDs and 356,000 rows total. About 40% of my IDs are unique, but the number of observations per ID ranges from 1 to 98 (I only really care about the first three observations per ID, but I need to be able to differentiate between the second and third). I successfully created a flag that identifies the duplicates, but it does not differentiate between the second and all duplicates following it (it's just a 0 /1 binary indicator currently, where =1 tells me the row is a duplicate). Here is my current code:

 

data want;

set data have;

by idnumber;

if first.idnumber=0 or last.idnumber=0 then dupflag = 1;

else flag=0;

run;

 

Any assistance is appreciated. Thank you!


 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 687 views
  • 0 likes
  • 3 in conversation