Hi everyone, I found it's wired that my SQL left join output duplicates values, seems the Cartesian product was not cleaned/selected by SAS. My left join didn't result in the same number of rows as in my left table. I run a sort with nodupkep option and cleaned up the duplicates by hand, but I am very curious what happened.
Here are the proc contents for my two table:
Here are my SQL Left Join code:
proc sql;
create table MergedHDF as
select coalesce(p.ProductYear,q.ProductYear)as ProductYear,
p.MeasurementYear,
coalesce(p.SubmissionId, q.SubmissionId)as SubmissionId,
coalesce(p.OrganizationId,q.OrganizationId)as OrganizationId,
coalesce(p.OrganizationName,q.OrganizationName) as OrganizationName,
coalesce(p.ProductLine, q.ProductLine) as ProductLine,
coalesce(p.ReportingProduct, q.ReportingProduct) as ReportingProduct,
coalesce(p.MeasureCode,q.MeasureCode) as MeasureCode,
p.MeasureName, p.IndicatorName, p.IndicatorCode,
p.Rate, p.Numerator, p.Denominator, p.EligiblePopulation,
p.Type, p.Region, p.sum_num, p.sum_denom,p.AggregateRate,
coalesce(p.Status,q.Status) as Status,q.states
from averagedrate as p
left join
combinedstates as q
on p.organizationid=q.organizationid and p.SubmissionId=q.submissionId and
p.ProductLine=q.ProductLine and p.ReportingProduct=q.ReportingProduct;
quit;
proc sort data=mergedHDF nodupkey;
by measurecode productyear organizationid submissionid productline reportingproduct status;
run;
Here are my Log:
Note that there are 5426 rows, rather than 2816 rows as in my left table.
Here are the log after I remove duplicates:
This really confuses me, any idea on why this happened will be greatly appreciated!
Because there are duplicates in your Q table.
data p;
input
org submit product report uniquepone:$1. uniqueptwo:$1.;
datalines;
1 2 3 4 A B
2 2 2 2 C D
3 3 4 6 E F
;
run;
data q;
input
org submit product report uniqueqone:$1. uniqueqtwo:$1.;
datalines;
1 2 3 4 G H
1 2 3 4 I J
2 2 2 2 K L
3 3 4 6 M N
3 3 4 6 O P
4 4 4 4 Q R
4 5 5 5 S T
;
run;
proc sql;
select p.org, p.submit, p.product, p.report, uniquepone, uniqueqtwo
from p left join q
on p.org = q.org and p.submit = q.submit and p.product = q.product and p.report = q.report;
quit;
You don't get three records. You get five.
The SAS System 14:45 Tuesday, November 28, 2017 4 org submit product report uniquepone uniqueqtwo 1 2 3 4 A H 1 2 3 4 A J 2 2 2 2 C L 3 3 4 6 E P 3 3 4 6 E N
Because there are duplicates in your Q table.
data p;
input
org submit product report uniquepone:$1. uniqueptwo:$1.;
datalines;
1 2 3 4 A B
2 2 2 2 C D
3 3 4 6 E F
;
run;
data q;
input
org submit product report uniqueqone:$1. uniqueqtwo:$1.;
datalines;
1 2 3 4 G H
1 2 3 4 I J
2 2 2 2 K L
3 3 4 6 M N
3 3 4 6 O P
4 4 4 4 Q R
4 5 5 5 S T
;
run;
proc sql;
select p.org, p.submit, p.product, p.report, uniquepone, uniqueqtwo
from p left join q
on p.org = q.org and p.submit = q.submit and p.product = q.product and p.report = q.report;
quit;
You don't get three records. You get five.
The SAS System 14:45 Tuesday, November 28, 2017 4 org submit product report uniquepone uniqueqtwo 1 2 3 4 A H 1 2 3 4 A J 2 2 2 2 C L 3 3 4 6 E P 3 3 4 6 E N
Aha! I see, thank you @HB
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.