Desktop productivity for business analysts and programmers

Duplicate values after join

Reply
Contributor
Posts: 64

Duplicate values after join

Hi all,

 

I'm assuming this is a simple answer but here it goes:

 

I have two tables for an email campaign: one with engagement levels, the other with revenue. I have the data rolled up to a regional grouping, and the unique ID's for each campaign are in both tables, so that is what I am using to join the two tables together. I basically want by campaign, by group - the engagements and revenue combined. 

 

My problem is that I am getting three matching rows returned now for each value that should be unique. 

 

Note that there are some campaigns without revenue, so I am using a left join to show the campaigns in the engagement table, and then also revenue where there is a match.

 

proc sql;
create table match as
select
distinct a.campaign_id,
a.region,
a.program,
a.processed_dttm,
a.sent,
a.open,
a.click,
b.bookings,
b.revenue
from table.engagement a
left join
table.revenue b
on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null
group by 1;
quit;

 

Thanks for you help ahead of time!

Super User
Posts: 24,004

Re: Duplicate values after join

That means one of your tables has duplicates. Figure out which one first.

Contributor
Posts: 64

Re: Duplicate values after join

Just checked both tables and there are no duplicates. There's a unique row for each row with the rolled up data.

Contributor
Posts: 64

Re: Duplicate values after join

Sorry, I meant to say unique row for each group of rolled up data in the table.

Super User
Posts: 24,004

Re: Duplicate values after join

Check your log, do you have an error in your code?

on a.campaign_id_id = b.campaign_id

 

Otherwise, If that join is correct with the variable names then you shouldn't get duplicates.

Please post the output from the following:

 

 

proc sql;
select count(campaign_id_id) as N, count(distinct campaign_id_id) as N_Distinct
from table.engagement;
select count(campaign_id) as N, count(distinct campaign_id) as N_Distinct
from table.revenue;
quit;

 

Contributor
Posts: 64

Re: Duplicate values after join

Capture.PNG

Contributor
Posts: 64

Re: Duplicate values after join

Sorry, the first one I posted had distinct for both columns, here's an updated version: 

 

Capture.PNG

Super User
Posts: 13,941

Re: Duplicate values after join

When your N is 2458 and the N_distinct is 887 then you either have one of the campaign_id values duplicated about 1570 times or many of the campaign_id values have one or more duplicates, averaging about 2.8 observations per id.

 

Perhaps your Where clause is referencing the wrong Id variable?

Super User
Posts: 24,004

Re: Duplicate values after join


schlotty23 wrote:

Sorry, the first one I posted had distinct for both columns, here's an updated version: 

 

Capture.PNG


That means you have duplicates....in both data sets. 

 

What makes you think it's unique? Are you sure you're using the correct files, joining on the correct variables? 

on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null

 

Do you have two variables or should the second one be campaign_id_id?

 

Either way, you have duplicates, so you'll get multiples. 

Contributor
Posts: 64

Re: Duplicate values after join

Figured it out. I had to join both on the campaign id AND the region in the join in order to obtain the desired results.

 

Thanks all!

Ask a Question
Discussion stats
  • 9 replies
  • 570 views
  • 2 likes
  • 3 in conversation