BookmarkSubscribeRSS Feed
Dogo23
Quartz | Level 8

Hi all,

 

I'm assuming this is a simple answer but here it goes:

 

I have two tables for an email campaign: one with engagement levels, the other with revenue. I have the data rolled up to a regional grouping, and the unique ID's for each campaign are in both tables, so that is what I am using to join the two tables together. I basically want by campaign, by group - the engagements and revenue combined. 

 

My problem is that I am getting three matching rows returned now for each value that should be unique. 

 

Note that there are some campaigns without revenue, so I am using a left join to show the campaigns in the engagement table, and then also revenue where there is a match.

 

proc sql;
create table match as
select
distinct a.campaign_id,
a.region,
a.program,
a.processed_dttm,
a.sent,
a.open,
a.click,
b.bookings,
b.revenue
from table.engagement a
left join
table.revenue b
on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null
group by 1;
quit;

 

Thanks for you help ahead of time!

9 REPLIES 9
Reeza
Super User

That means one of your tables has duplicates. Figure out which one first.

Dogo23
Quartz | Level 8

Just checked both tables and there are no duplicates. There's a unique row for each row with the rolled up data.

Dogo23
Quartz | Level 8

Sorry, I meant to say unique row for each group of rolled up data in the table.

Reeza
Super User

Check your log, do you have an error in your code?

on a.campaign_id_id = b.campaign_id

 

Otherwise, If that join is correct with the variable names then you shouldn't get duplicates.

Please post the output from the following:

 

 

proc sql;
select count(campaign_id_id) as N, count(distinct campaign_id_id) as N_Distinct
from table.engagement;
select count(campaign_id) as N, count(distinct campaign_id) as N_Distinct
from table.revenue;
quit;

 

Dogo23
Quartz | Level 8

Capture.PNG

Dogo23
Quartz | Level 8

Sorry, the first one I posted had distinct for both columns, here's an updated version: 

 

Capture.PNG

ballardw
Super User

When your N is 2458 and the N_distinct is 887 then you either have one of the campaign_id values duplicated about 1570 times or many of the campaign_id values have one or more duplicates, averaging about 2.8 observations per id.

 

Perhaps your Where clause is referencing the wrong Id variable?

Reeza
Super User

@schlotty23 wrote:

Sorry, the first one I posted had distinct for both columns, here's an updated version: 

 

Capture.PNG


That means you have duplicates....in both data sets. 

 

What makes you think it's unique? Are you sure you're using the correct files, joining on the correct variables? 

on a.campaign_id_id = b.campaign_id
where a.campaign_id is not null

 

Do you have two variables or should the second one be campaign_id_id?

 

Either way, you have duplicates, so you'll get multiples. 

Dogo23
Quartz | Level 8

Figured it out. I had to join both on the campaign id AND the region in the join in order to obtain the desired results.

 

Thanks all!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 6209 views
  • 2 likes
  • 3 in conversation