Hi Everyone. I have this data below. How can I create Duplicate field in EG based on ID. This is how I did in program code. Thank you.
data DUPS;
set WORK.ACTIVE_RECORDS ;
BY ID;
format DUPLICATE $CHAR30.;
IF (FIRST.ID) THEN DUPLICATE='';
ELSE DUPLICATE = 'DUPS';
RUN;
ID CATEGORY SUB_ID DUPLICATE
5692 C1 7000001
5986 C2 7000002
6050 C1 7000003
6080 C2 7000004
7789 C4 7000005
7789 C4 7000006 DUPS
7790 C1 7000007
7791 C1 7000008
8695 C1 7000009
8695 C2 7000010 DUPS
9526 C1 7000011
9528 C1 7000012
9598 C1 7000013
9598 C2 7000014 DUPS
Are you asking how to create it using the GUI then? Or are you asking how to make your code more efficient or alternative code solutions?
@sasuser_sk wrote:
Hi Everyone. I have this data below. How can I create Duplicate field in EG based on ID. This is how I did in program code. Thank you.
data DUPS;
set WORK.ACTIVE_RECORDS ;
BY ID;
format DUPLICATE $CHAR30.;
IF (FIRST.ID) THEN DUPLICATE='';
ELSE DUPLICATE = 'DUPS';
RUN;
ID CATEGORY SUB_ID DUPLICATE
5692 C1 7000001
5986 C2 7000002
6050 C1 7000003
6080 C2 7000004
7789 C4 7000005
7789 C4 7000006 DUPS
7790 C1 7000007
7791 C1 7000008
8695 C1 7000009
8695 C2 7000010 DUPS
9526 C1 7000011
9528 C1 7000012
9598 C1 7000013
9598 C2 7000014 DUPS
Your program will work fine in EG so I don't understand your problem.
I am looking for a way to do this in query builder instead in EG as I do not want to create a program code for this, which is an extra step always. If there is a way to build same logic in query builder I would love to learn that.
The Query Builder only uses SQL and it isn't as easy that way. You can add your program as a Code Node in your EG project though.
If by "adding program as a Code Node" you mean something like in pic below then yes that is what I always do but due to the way my projects are I prefer to learn build similar concept in query builder. I make duplicate field very often when I make joins. I am willing to learn a harder way if it helps me keep my project clearer by building messy formulas in query builder.
Thanks Reeza. In the last I would be removing the duplicates. But after glancing over the data with dups I choose whether to keep the dup
value or exclude the other dup value that is blank.
I like both options :
maximum per group via a CASE statement, and
ranking
I have tried one or the other depending on the needs of the project and find them useful but just to avoid a few extra steps I was looking to create something quicker for DUPS in query builder.
But if no such concept (program Code) could be done in query builder then I'll make my peace with program code.
Thank you for suggestions.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.