Help using Base SAS procedures

Remove duplicates based on one column

Accepted Solution Solved
Reply
Regular Contributor
Posts: 186
Accepted Solution

Remove duplicates based on one column

Hi,

I want to remove duplicates in a table based on one column. First of all, I would like to get a better understanding of the sql inner join command in order, maybe, to create the table as I want in the first place.

I have two tables :

sasuser.final_table :

Client_NoInfo1
1234aaa
5678bbb
9123ccc
4567ddd

sasuser.client_name :

Client_NoClient_name
1234Julie
1234Julie Tremblay
5678Alex
5678Alexander
9123Bob
4567George

What I want to do is to create a table named final_table_with_name where for each Client_No in sasuser.final_table I want to match one name. As of right now I am using proc sql inner join client_no on (client_name.client_no = final_table.client_no). The thing is that this query returns me two records for client number that have more than one client name. What is the proper way to use join and have one client_name per client_no (even if there is more than one match)?

I would also like to know how to properly remove duplicates based on one column :

If we take my previous example I would try something like the following but I am unable to make it work :

data want;

   set sasuser.final_table_with_name;

   by client_no;

  if last.client_no;

run;

proc sort data=sasuser.final_table_with_name out=want ;

   by client_no;

run;

Hope you can help me with this.

Thank you for your help and time!


Accepted Solutions
Solution
‎09-21-2012 02:36 PM
Super Contributor
Posts: 1,636

Re: Remove duplicates based on one column

to remove observations with duplicate by variables try:

proc sort data=your-data  out=newdata nodupkey ;

by your-variable;

run;

View solution in original post


All Replies
Solution
‎09-21-2012 02:36 PM
Super Contributor
Posts: 1,636

Re: Remove duplicates based on one column

to remove observations with duplicate by variables try:

proc sort data=your-data  out=newdata nodupkey ;

by your-variable;

run;

Regular Contributor
Posts: 186

Re: Remove duplicates based on one column

Works great linlin thank you

Respected Advisor
Posts: 3,124

Re: Remove duplicates based on one column

Is this what you want? (presort by client_no may be needed for both tables)

data h1;

  input Client_No$ Info1$;

cards;

1234 aaa

4567 ddd 

5678 bbb

9123 ccc

;

data h2;

input Client_No$ Client_name$&20.;

cards;

1234 Julie

1234 Julie Tremblay

4567 George

5678 Alex

5678 Alexander

9123 Bob

;

data want;

  merge h1 h2;

    by client_no;

        if last.client_no;

run;

Or even simpler:

data h1;

  update h1 h2;

    by client_no;

run;

Haikuo

Regular Contributor
Posts: 186

Re: Remove duplicates based on one column

Thank you Haikuo. Your method works great as well.

The other part of my question is with regards with sql join command.

Every time I use left join or inner join I get multiple matches for one variable, when I would like to have 1 match per variable ( as a vlookup would do in Excel).

Most of the time this happen when I want to have every observation from one table and 1 result from the other and I can't user inner join since some of the observations in the first table aren't in the second table and I still want to have them in the created table with a NULL value as the info from the 2nd table.

I usually use left join (since I want to make sure to have every obesrvation, even those who don't have any match).

What is the proper sql procedure to create a table where all the data from the left is there, but it is only matched with one item from the table on the right.

Hope you understand what I am trying to explain.

Thank you for your help and time

Respected Advisor
Posts: 3,124

Re: Remove duplicates based on one column

Ok, basically you have two questions:

Q1. Only one match will be kept, even though there are multiple matches in table2:

A1: Not by SQL. Unless the records is completely duplicated, SQL is not able to keep one record from the pool of Cartesian Products.

Q2. For keys existing in table1, but not in table2, so the variables from table2 will be missing in the joined table.

A1: Yes, SQL left join will just do that automatically.

Haikuo

Regular Contributor
Posts: 186

Re: Remove duplicates based on one column

Thank you Haikuo.

This answer exactly my question.

Thank you very much to both of you for your help and time.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 5680 views
  • 6 likes
  • 3 in conversation