BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nicnad
Fluorite | Level 6

Hi,

I want to remove duplicates in a table based on one column. First of all, I would like to get a better understanding of the sql inner join command in order, maybe, to create the table as I want in the first place.

I have two tables :

sasuser.final_table :

Client_NoInfo1
1234aaa
5678bbb
9123ccc
4567ddd

sasuser.client_name :

Client_NoClient_name
1234Julie
1234Julie Tremblay
5678Alex
5678Alexander
9123Bob
4567George

What I want to do is to create a table named final_table_with_name where for each Client_No in sasuser.final_table I want to match one name. As of right now I am using proc sql inner join client_no on (client_name.client_no = final_table.client_no). The thing is that this query returns me two records for client number that have more than one client name. What is the proper way to use join and have one client_name per client_no (even if there is more than one match)?

I would also like to know how to properly remove duplicates based on one column :

If we take my previous example I would try something like the following but I am unable to make it work :

data want;

   set sasuser.final_table_with_name;

   by client_no;

  if last.client_no;

run;

proc sort data=sasuser.final_table_with_name out=want ;

   by client_no;

run;

Hope you can help me with this.

Thank you for your help and time!

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

to remove observations with duplicate by variables try:

proc sort data=your-data  out=newdata nodupkey ;

by your-variable;

run;

View solution in original post

6 REPLIES 6
Linlin
Lapis Lazuli | Level 10

to remove observations with duplicate by variables try:

proc sort data=your-data  out=newdata nodupkey ;

by your-variable;

run;

nicnad
Fluorite | Level 6

Works great linlin thank you

Haikuo
Onyx | Level 15

Is this what you want? (presort by client_no may be needed for both tables)

data h1;

  input Client_No$ Info1$;

cards;

1234 aaa

4567 ddd 

5678 bbb

9123 ccc

;

data h2;

input Client_No$ Client_name$&20.;

cards;

1234 Julie

1234 Julie Tremblay

4567 George

5678 Alex

5678 Alexander

9123 Bob

;

data want;

  merge h1 h2;

    by client_no;

        if last.client_no;

run;

Or even simpler:

data h1;

  update h1 h2;

    by client_no;

run;

Haikuo

nicnad
Fluorite | Level 6

Thank you Haikuo. Your method works great as well.

The other part of my question is with regards with sql join command.

Every time I use left join or inner join I get multiple matches for one variable, when I would like to have 1 match per variable ( as a vlookup would do in Excel).

Most of the time this happen when I want to have every observation from one table and 1 result from the other and I can't user inner join since some of the observations in the first table aren't in the second table and I still want to have them in the created table with a NULL value as the info from the 2nd table.

I usually use left join (since I want to make sure to have every obesrvation, even those who don't have any match).

What is the proper sql procedure to create a table where all the data from the left is there, but it is only matched with one item from the table on the right.

Hope you understand what I am trying to explain.

Thank you for your help and time

Haikuo
Onyx | Level 15

Ok, basically you have two questions:

Q1. Only one match will be kept, even though there are multiple matches in table2:

A1: Not by SQL. Unless the records is completely duplicated, SQL is not able to keep one record from the pool of Cartesian Products.

Q2. For keys existing in table1, but not in table2, so the variables from table2 will be missing in the joined table.

A1: Yes, SQL left join will just do that automatically.

Haikuo

nicnad
Fluorite | Level 6

Thank you Haikuo.

This answer exactly my question.

Thank you very much to both of you for your help and time.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 24803 views
  • 6 likes
  • 3 in conversation