Hi all,
I'm assuming this is a simple answer but here it goes:
I have two tables for an email campaign: one with engagement levels, the other with revenue. I have the data rolled up to a regional grouping, and the unique ID's for each campaign are in both tables, so that is what I am using to join the two tables together. I basically want by campaign, by group - the engagements and revenue combined.
My problem is that I am getting three matching rows returned now for each value that should be unique.
Note that there are some campaigns without revenue, so I am using a left join to show the campaigns in the engagement table, and then also revenue where there is a match.
proc sql;
create table match as
select
distinct a.campaign_id,
a.region,
a.program,
a.processed_dttm,
a.sent,
a.open,
a.click,
b.bookings,
b.revenue
from table.engagement a
left join
table.revenue b
on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null
group by 1;
quit;
Thanks for you help ahead of time!
That means one of your tables has duplicates. Figure out which one first.
Just checked both tables and there are no duplicates. There's a unique row for each row with the rolled up data.
Sorry, I meant to say unique row for each group of rolled up data in the table.
Check your log, do you have an error in your code?
on a.campaign_id_id = b.campaign_id
Otherwise, If that join is correct with the variable names then you shouldn't get duplicates.
Please post the output from the following:
proc sql;
select count(campaign_id_id) as N, count(distinct campaign_id_id) as N_Distinct
from table.engagement;
select count(campaign_id) as N, count(distinct campaign_id) as N_Distinct
from table.revenue;
quit;
Sorry, the first one I posted had distinct for both columns, here's an updated version:
When your N is 2458 and the N_distinct is 887 then you either have one of the campaign_id values duplicated about 1570 times or many of the campaign_id values have one or more duplicates, averaging about 2.8 observations per id.
Perhaps your Where clause is referencing the wrong Id variable?
@schlotty23 wrote:
Sorry, the first one I posted had distinct for both columns, here's an updated version:
That means you have duplicates....in both data sets.
What makes you think it's unique? Are you sure you're using the correct files, joining on the correct variables?
on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null
Do you have two variables or should the second one be campaign_id_id?
Either way, you have duplicates, so you'll get multiples.
Figured it out. I had to join both on the campaign id AND the region in the join in order to obtain the desired results.
Thanks all!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.