Hi,
I want to remove duplicates in a table based on one column. First of all, I would like to get a better understanding of the sql inner join command in order, maybe, to create the table as I want in the first place.
I have two tables :
sasuser.final_table :
| Client_No | Info1 | 
|---|---|
| 1234 | aaa | 
| 5678 | bbb | 
| 9123 | ccc | 
| 4567 | ddd | 
sasuser.client_name :
| Client_No | Client_name | 
|---|---|
| 1234 | Julie | 
| 1234 | Julie Tremblay | 
| 5678 | Alex | 
| 5678 | Alexander | 
| 9123 | Bob | 
| 4567 | George | 
What I want to do is to create a table named final_table_with_name where for each Client_No in sasuser.final_table I want to match one name. As of right now I am using proc sql inner join client_no on (client_name.client_no = final_table.client_no). The thing is that this query returns me two records for client number that have more than one client name. What is the proper way to use join and have one client_name per client_no (even if there is more than one match)?
I would also like to know how to properly remove duplicates based on one column :
If we take my previous example I would try something like the following but I am unable to make it work :
data want;
set sasuser.final_table_with_name;
by client_no;
if last.client_no;
run;
proc sort data=sasuser.final_table_with_name out=want ;
by client_no;
run;
Hope you can help me with this.
Thank you for your help and time!
to remove observations with duplicate by variables try:
proc sort data=your-data out=newdata nodupkey ;
by your-variable;
run;
to remove observations with duplicate by variables try:
proc sort data=your-data out=newdata nodupkey ;
by your-variable;
run;
Works great linlin thank you
Is this what you want? (presort by client_no may be needed for both tables)
data h1;
input Client_No$ Info1$;
cards;
1234 aaa
4567 ddd
5678 bbb
9123 ccc
;
data h2;
input Client_No$ Client_name$&20.;
cards;
1234 Julie
1234 Julie Tremblay
4567 George
5678 Alex
5678 Alexander
9123 Bob
;
data want;
merge h1 h2;
by client_no;
if last.client_no;
run;
Or even simpler:
data h1;
update h1 h2;
by client_no;
run;
Haikuo
Thank you Haikuo. Your method works great as well.
The other part of my question is with regards with sql join command.
Every time I use left join or inner join I get multiple matches for one variable, when I would like to have 1 match per variable ( as a vlookup would do in Excel).
Most of the time this happen when I want to have every observation from one table and 1 result from the other and I can't user inner join since some of the observations in the first table aren't in the second table and I still want to have them in the created table with a NULL value as the info from the 2nd table.
I usually use left join (since I want to make sure to have every obesrvation, even those who don't have any match).
What is the proper sql procedure to create a table where all the data from the left is there, but it is only matched with one item from the table on the right.
Hope you understand what I am trying to explain.
Thank you for your help and time
Ok, basically you have two questions:
Q1. Only one match will be kept, even though there are multiple matches in table2:
A1: Not by SQL. Unless the records is completely duplicated, SQL is not able to keep one record from the pool of Cartesian Products.
Q2. For keys existing in table1, but not in table2, so the variables from table2 will be missing in the joined table.
A1: Yes, SQL left join will just do that automatically.
Haikuo
Thank you Haikuo.
This answer exactly my question.
Thank you very much to both of you for your help and time.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
