BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sandrube
Fluorite | Level 6

Hello everyone! 

 

I'm looking for some help in removing duplicates based on two logic.

 

Here is the data set:

 

data datadup;

input C_ID FDA P_ID;

datalines;

1 2001 85

1 2001 85

1 2099 85

1 2099 85

2 2054 85

2 2054 95

3 2043 54

4 2018 24

5 3039 93

6 3736 37

7 4645 36

7 4645 99

7 7869 74

7 7869 96

7 7869 96

8 9470 65

9 8676 23

10 8736 81

10 8736 93

10 8736 99

;

run;

 

LOGIC: For rows with the same C_ID, select those with highest FDA. For rows with the same C_ID and FDA,select the one with the highest P_ID.

 

The output should look like this:

 

data data_nodup;

input C_ID FDA P_ID;

datalines;

 

1 2099 85

1 2099 85

2 2054 95

3 2043 54

4 2018 24

5 3039 93

6 3736 37

7 4645 99

7 7869 96

7 7869 96

8 9470 65

9 8676 23

10 8736 99

;

run;

 

Thank you for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

hi @sandrube   ,  @VDD  's approach is close, needs a small tweak if i understand your req correctly

 

data datadup;
input C_ID FDA P_ID cars$;
datalines;
1 2001 85 honda
1 2001 85 tesla
1 2099 85 toyota
1 2099 85 nissan
2 2054 85 toyota
2 2054 95 tesla
3 2043 54 toyota
4 2018 24 tesla
5 3039 93 honda
6 3736 37 honda
7 4645 36 toyota
7 4645 99 tesla
7 7869 74 nissan
7 7869 96 ford
7 7869 96 honda
8 9470 65 toyota
9 8676 23 honda
10 8736 81 gmc
10 8736 93 toyota
10 8736 99 honda
;
run;
proc sql;
	create table want as
	select *
	from datadup
	group by c_id, fda
	having max(p_id)=p_id;
quit;

View solution in original post

12 REPLIES 12
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

your want table has two of the same c_id for 1 and 7, but this code will give you the requested results you want.

data datadup;
input C_ID FDA P_ID;
datalines;
1 2001 85
1 2001 85
1 2099 85
1 2099 85
2 2054 85
2 2054 95
3 2043 54
4 2018 24
5 3039 93
6 3736 37
7 4645 36
7 4645 99
7 7869 74
7 7869 96
7 7869 96
8 9470 65
9 8676 23
10 8736 81
10 8736 93
10 8736 99
;
run;
proc sql;
	create table want as
	select c_id, fda, max(p_id)
	from datadup
	group by c_id, fda;
quit;

 

sandrube
Fluorite | Level 6

Thank you VDD,

 

I think your code correctly removed the duplicates for first logic, but not for the second one. For example if you look at C_ID with 7, it didn't come correct.

 

Thanks!

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

have in datadup

7 4645 36

7 4645 99

7 7869 74

7 7869 96

7 7869 96

 

the want

7 4645 99

7 7869 96

7 7869 96

 

has a dup you can not have the 7 7869 96 twice based on the request.

 

@sandrube For rows with the same C_ID and FDA,select the one with the highest P_ID.

 

 

sandrube
Fluorite | Level 6

Oh yes, I understand what you are trying to say, and you are correct. There are other variables and I didn't put them to make it simple.

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

Ok what are those other things to make it work for you or has your request been completed?

sandrube
Fluorite | Level 6

Hello VDD, thank you so much for the help!

 

Here is the data:

 


data datadup;
input C_ID FDA P_ID cars$;
datalines;
1 2001 85 honda
1 2001 85 tesla
1 2099 85 toyota
1 2099 85 nissan
2 2054 85 toyota
2 2054 95 tesla
3 2043 54 toyota
4 2018 24 tesla
5 3039 93 honda
6 3736 37 honda
7 4645 36 toyota
7 4645 99 tesla
7 7869 74 nissan
7 7869 96 ford
7 7869 96 honda
8 9470 65 toyota
9 8676 23 honda
10 8736 81 gmc
10 8736 93 toyota
10 8736 99 honda
;
run;

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

ok so which car should belong to each of your want records?

 

sandrube
Fluorite | Level 6

Let me try to rephrase the logiv:

 

For rows with the same C_ID, keep those with highest FDA.

For rows with the same C_ID and FDA, keep the one with the highest P_ID.

 

Hope this help!

 

Thanks a lot!

novinosrin
Tourmaline | Level 20

hi @sandrube   ,  @VDD  's approach is close, needs a small tweak if i understand your req correctly

 

data datadup;
input C_ID FDA P_ID cars$;
datalines;
1 2001 85 honda
1 2001 85 tesla
1 2099 85 toyota
1 2099 85 nissan
2 2054 85 toyota
2 2054 95 tesla
3 2043 54 toyota
4 2018 24 tesla
5 3039 93 honda
6 3736 37 honda
7 4645 36 toyota
7 4645 99 tesla
7 7869 74 nissan
7 7869 96 ford
7 7869 96 honda
8 9470 65 toyota
9 8676 23 honda
10 8736 81 gmc
10 8736 93 toyota
10 8736 99 honda
;
run;
proc sql;
	create table want as
	select *
	from datadup
	group by c_id, fda
	having max(p_id)=p_id;
quit;
sandrube
Fluorite | Level 6

Yes, this is perfect. Thank you so much and really appreciate your help!

Reeza
Super User
Please mark the question as answered by selection one of the solutions above (not this post!)
sandrube
Fluorite | Level 6
Done! Thanks!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1115 views
  • 0 likes
  • 4 in conversation