09-21-2012 02:29 PM
I want to remove duplicates in a table based on one column. First of all, I would like to get a better understanding of the sql inner join command in order, maybe, to create the table as I want in the first place.
I have two tables :
What I want to do is to create a table named final_table_with_name where for each Client_No in sasuser.final_table I want to match one name. As of right now I am using proc sql inner join client_no on (client_name.client_no = final_table.client_no). The thing is that this query returns me two records for client number that have more than one client name. What is the proper way to use join and have one client_name per client_no (even if there is more than one match)?
I would also like to know how to properly remove duplicates based on one column :
If we take my previous example I would try something like the following but I am unable to make it work :
proc sort data=sasuser.final_table_with_name out=want ;
Hope you can help me with this.
Thank you for your help and time!
09-21-2012 02:53 PM
Is this what you want? (presort by client_no may be needed for both tables)
input Client_No$ Info1$;
input Client_No$ Client_name$&20.;
1234 Julie Tremblay
merge h1 h2;
Or even simpler:
update h1 h2;
09-21-2012 03:03 PM
Thank you Haikuo. Your method works great as well.
The other part of my question is with regards with sql join command.
Every time I use left join or inner join I get multiple matches for one variable, when I would like to have 1 match per variable ( as a vlookup would do in Excel).
Most of the time this happen when I want to have every observation from one table and 1 result from the other and I can't user inner join since some of the observations in the first table aren't in the second table and I still want to have them in the created table with a NULL value as the info from the 2nd table.
I usually use left join (since I want to make sure to have every obesrvation, even those who don't have any match).
What is the proper sql procedure to create a table where all the data from the left is there, but it is only matched with one item from the table on the right.
Hope you understand what I am trying to explain.
Thank you for your help and time
09-21-2012 03:19 PM
Ok, basically you have two questions:
Q1. Only one match will be kept, even though there are multiple matches in table2:
A1: Not by SQL. Unless the records is completely duplicated, SQL is not able to keep one record from the pool of Cartesian Products.
Q2. For keys existing in table1, but not in table2, so the variables from table2 will be missing in the joined table.
A1: Yes, SQL left join will just do that automatically.
Need further help from the community? Please ask a new question.