Desktop productivity for business analysts and programmers

Duplicate Rows After Left Join

Reply
Contributor
Posts: 64

Duplicate Rows After Left Join

Simple question probably, but I keep arranging and rearranging  distinct and group by in the code and I'm not getting anything different.

 

I have two tables that share an ID between the two of them. Let's say table a has revenue data associated with each ID and table two has an instance where two titles are associated with the same ID. I use a left join of table an and ask for a distinct ID, and even group by, but I keep getting two row back for that ID, one for each name from table 

 

proc sql;
create table tableau.metadata_match as 
select
distinct a.product_id,
a.region,
b.product_name,
a.transactions,
a.revenue
from table a
left join 
table b 
on a.product_id = b.product_id;
group by 1, 2 quit;

 

 

It returns these rows repeating the revenue. What I really want is only row for this.

product_idregionproduct_nametransactionsrevenue
54321SWproduct_a1256.11
54321SWproduct_b1256.11

 

 

sd

Super Contributor
Posts: 271

Re: Duplicate Rows After Left Join

I see few problem with the sql query.

 

You should remove group by statement becuase I do not find any aggregate function being used in the select clause.

Then try again.

 

 

Valued Guide
Posts: 629

Re: Duplicate Rows After Left Join

Which match in the second table should be selected?

 

Post some example input-datasets as data-steps, so that we see what you have.

Contributor
Posts: 64

Re: Duplicate Rows After Left Join

Posted in reply to andreas_lds

Hi Andreas,

 

I posted fake data to show an example of the scenario and what output I'm receiving. However, in the real scenario, If I need to specify just one of these, how do I do it by choosing either randomly or based off of occurrence using a date field (first occurrence, most recent occurrence)?

PROC Star
Posts: 399

Re: Duplicate Rows After Left Join

The problem is that you still have product in your query and there are 2 different values which gives you 2 rows of data.

If you remove it you will get 1 row.

But if you still need product in your result you need to descide which one to keep, as Andreas says Smiley Happy

//Fredrik

Contributor
Posts: 64

Re: Duplicate Rows After Left Join

 If I need to specify just one of these, how do I do it by choosing either randomly or based off of occurrence using a date field (first occurrence, most recent occurrence)?

Super User
Posts: 13,941

Re: Duplicate Rows After Left Join

I think that you may be misunderstanding what the DISTINCT actually attempts to do. It gets distinct combinations of ALL variables on the select statement not just the first variable.

Ask a Question
Discussion stats
  • 6 replies
  • 820 views
  • 0 likes
  • 5 in conversation