- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm assuming this is a simple answer but here it goes:
I have two tables for an email campaign: one with engagement levels, the other with revenue. I have the data rolled up to a regional grouping, and the unique ID's for each campaign are in both tables, so that is what I am using to join the two tables together. I basically want by campaign, by group - the engagements and revenue combined.
My problem is that I am getting three matching rows returned now for each value that should be unique.
Note that there are some campaigns without revenue, so I am using a left join to show the campaigns in the engagement table, and then also revenue where there is a match.
proc sql;
create table match as
select
distinct a.campaign_id,
a.region,
a.program,
a.processed_dttm,
a.sent,
a.open,
a.click,
b.bookings,
b.revenue
from table.engagement a
left join
table.revenue b
on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null
group by 1;
quit;
Thanks for you help ahead of time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That means one of your tables has duplicates. Figure out which one first.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just checked both tables and there are no duplicates. There's a unique row for each row with the rolled up data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I meant to say unique row for each group of rolled up data in the table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check your log, do you have an error in your code?
on a.campaign_id_id = b.campaign_id
Otherwise, If that join is correct with the variable names then you shouldn't get duplicates.
Please post the output from the following:
proc sql;
select count(campaign_id_id) as N, count(distinct campaign_id_id) as N_Distinct
from table.engagement;
select count(campaign_id) as N, count(distinct campaign_id) as N_Distinct
from table.revenue;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, the first one I posted had distinct for both columns, here's an updated version:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When your N is 2458 and the N_distinct is 887 then you either have one of the campaign_id values duplicated about 1570 times or many of the campaign_id values have one or more duplicates, averaging about 2.8 observations per id.
Perhaps your Where clause is referencing the wrong Id variable?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@schlotty23 wrote:
Sorry, the first one I posted had distinct for both columns, here's an updated version:
That means you have duplicates....in both data sets.
What makes you think it's unique? Are you sure you're using the correct files, joining on the correct variables?
on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null
Do you have two variables or should the second one be campaign_id_id?
Either way, you have duplicates, so you'll get multiples.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Figured it out. I had to join both on the campaign id AND the region in the join in order to obtain the desired results.
Thanks all!