Hello everyone!
I'm looking for some help in removing duplicates based on two logic.
Here is the data set:
data datadup;
input C_ID FDA P_ID;
datalines;
1 2001 85
1 2001 85
1 2099 85
1 2099 85
2 2054 85
2 2054 95
3 2043 54
4 2018 24
5 3039 93
6 3736 37
7 4645 36
7 4645 99
7 7869 74
7 7869 96
7 7869 96
8 9470 65
9 8676 23
10 8736 81
10 8736 93
10 8736 99
;
run;
LOGIC: For rows with the same C_ID, select those with highest FDA. For rows with the same C_ID and FDA,select the one with the highest P_ID.
The output should look like this:
data data_nodup;
input C_ID FDA P_ID;
datalines;
1 2099 85
1 2099 85
2 2054 95
3 2043 54
4 2018 24
5 3039 93
6 3736 37
7 4645 99
7 7869 96
7 7869 96
8 9470 65
9 8676 23
10 8736 99
;
run;
Thank you for your help!
hi @sandrube , @VDD 's approach is close, needs a small tweak if i understand your req correctly
data datadup;
input C_ID FDA P_ID cars$;
datalines;
1 2001 85 honda
1 2001 85 tesla
1 2099 85 toyota
1 2099 85 nissan
2 2054 85 toyota
2 2054 95 tesla
3 2043 54 toyota
4 2018 24 tesla
5 3039 93 honda
6 3736 37 honda
7 4645 36 toyota
7 4645 99 tesla
7 7869 74 nissan
7 7869 96 ford
7 7869 96 honda
8 9470 65 toyota
9 8676 23 honda
10 8736 81 gmc
10 8736 93 toyota
10 8736 99 honda
;
run;
proc sql;
create table want as
select *
from datadup
group by c_id, fda
having max(p_id)=p_id;
quit;
your want table has two of the same c_id for 1 and 7, but this code will give you the requested results you want.
data datadup;
input C_ID FDA P_ID;
datalines;
1 2001 85
1 2001 85
1 2099 85
1 2099 85
2 2054 85
2 2054 95
3 2043 54
4 2018 24
5 3039 93
6 3736 37
7 4645 36
7 4645 99
7 7869 74
7 7869 96
7 7869 96
8 9470 65
9 8676 23
10 8736 81
10 8736 93
10 8736 99
;
run;
proc sql;
create table want as
select c_id, fda, max(p_id)
from datadup
group by c_id, fda;
quit;
Thank you VDD,
I think your code correctly removed the duplicates for first logic, but not for the second one. For example if you look at C_ID with 7, it didn't come correct.
Thanks!
have in datadup
7 4645 36
7 4645 99
7 7869 74
7 7869 96
7 7869 96
the want
7 4645 99
7 7869 96
7 7869 96
has a dup you can not have the 7 7869 96 twice based on the request.
@sandrube For rows with the same C_ID and FDA,select the one with the highest P_ID.
Oh yes, I understand what you are trying to say, and you are correct. There are other variables and I didn't put them to make it simple.
Ok what are those other things to make it work for you or has your request been completed?
Hello VDD, thank you so much for the help!
Here is the data:
data datadup;
input C_ID FDA P_ID cars$;
datalines;
1 2001 85 honda
1 2001 85 tesla
1 2099 85 toyota
1 2099 85 nissan
2 2054 85 toyota
2 2054 95 tesla
3 2043 54 toyota
4 2018 24 tesla
5 3039 93 honda
6 3736 37 honda
7 4645 36 toyota
7 4645 99 tesla
7 7869 74 nissan
7 7869 96 ford
7 7869 96 honda
8 9470 65 toyota
9 8676 23 honda
10 8736 81 gmc
10 8736 93 toyota
10 8736 99 honda
;
run;
ok so which car should belong to each of your want records?
Let me try to rephrase the logiv:
For rows with the same C_ID, keep those with highest FDA.
For rows with the same C_ID and FDA, keep the one with the highest P_ID.
Hope this help!
Thanks a lot!
hi @sandrube , @VDD 's approach is close, needs a small tweak if i understand your req correctly
data datadup;
input C_ID FDA P_ID cars$;
datalines;
1 2001 85 honda
1 2001 85 tesla
1 2099 85 toyota
1 2099 85 nissan
2 2054 85 toyota
2 2054 95 tesla
3 2043 54 toyota
4 2018 24 tesla
5 3039 93 honda
6 3736 37 honda
7 4645 36 toyota
7 4645 99 tesla
7 7869 74 nissan
7 7869 96 ford
7 7869 96 honda
8 9470 65 toyota
9 8676 23 honda
10 8736 81 gmc
10 8736 93 toyota
10 8736 99 honda
;
run;
proc sql;
create table want as
select *
from datadup
group by c_id, fda
having max(p_id)=p_id;
quit;
Yes, this is perfect. Thank you so much and really appreciate your help!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.