SAS Enterprise Guide

Dogo23 · Posted 01-03-2018 06:07 PM

Hi all,

I'm assuming this is a simple answer but here it goes:

I have two tables for an email campaign: one with engagement levels, the other with revenue. I have the data rolled up to a regional grouping, and the unique ID's for each campaign are in both tables, so that is what I am using to join the two tables together. I basically want by campaign, by group - the engagements and revenue combined.

My problem is that I am getting three matching rows returned now for each value that should be unique.

Note that there are some campaigns without revenue, so I am using a left join to show the campaigns in the engagement table, and then also revenue where there is a match.

proc sql;
create table match as
select
distinct a.campaign_id,
a.region,
a.program,
a.processed_dttm,
a.sent,
a.open,
a.click,
b.bookings,
b.revenue
from table.engagement a
left join
table.revenue b
on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null
group by 1;
quit;

Thanks for you help ahead of time!

Reeza · Posted 01-03-2018 06:14 PM

That means one of your tables has duplicates. Figure out which one first.

Dogo23 · Posted 01-03-2018 06:19 PM

Just checked both tables and there are no duplicates. There's a unique row for each row with the rolled up data.

Dogo23 · Posted 01-03-2018 06:23 PM

Sorry, I meant to say unique row for each group of rolled up data in the table.

Reeza · Posted 01-03-2018 06:32 PM

Check your log, do you have an error in your code?

on a.campaign_id_id = b.campaign_id

Otherwise, If that join is correct with the variable names then you shouldn't get duplicates.

Please post the output from the following:

proc sql;
select count(campaign_id_id) as N, count(distinct campaign_id_id) as N_Distinct
from table.engagement;
select count(campaign_id) as N, count(distinct campaign_id) as N_Distinct
from table.revenue;
quit;

Dogo23 · Posted 01-03-2018 06:42 PM

Dogo23 · Posted 01-03-2018 06:46 PM

Sorry, the first one I posted had distinct for both columns, here's an updated version:

ballardw · Posted 01-03-2018 07:11 PM

When your N is 2458 and the N_distinct is 887 then you either have one of the campaign_id values duplicated about 1570 times or many of the campaign_id values have one or more duplicates, averaging about 2.8 observations per id.

Perhaps your Where clause is referencing the wrong Id variable?

Reeza · Posted 01-03-2018 08:36 PM

@schlotty23 wrote:

Sorry, the first one I posted had distinct for both columns, here's an updated version:

That means you have duplicates....in both data sets.

What makes you think it's unique? Are you sure you're using the correct files, joining on the correct variables?

on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null

Do you have two variables or should the second one be campaign_id_id?

Either way, you have duplicates, so you'll get multiples.

Dogo23 · Posted 01-04-2018 02:51 PM

Figured it out. I had to join both on the campaign id AND the region in the join in order to obtain the desired results.

Thanks all!

SAS Enterprise Guide

Duplicate values after join

Re: Duplicate values after join

Re: Duplicate values after join

Re: Duplicate values after join

Re: Duplicate values after join

Re: Duplicate values after join

Re: Duplicate values after join

Re: Duplicate values after join

Re: Duplicate values after join

Re: Duplicate values after join

[SQL] JOIN - 두 개 이상의 테이블 데이터 Join

Why Proc SQL left join output duplicates values?

Duplicate values display only once

Join produces duplicates

Duplicate value within ID

Follow Us

What is...

SAS Enterprise Guide

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...