BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LisaYIN9309
Obsidian | Level 7

 

Hi everyone, I found it's wired that my SQL left join output duplicates values, seems the Cartesian product was not cleaned/selected by SAS. My left join didn't result in the same number of rows as in my left table. I run a sort with nodupkep option and cleaned up the duplicates by hand, but I am very curious what happened.

 

Here are the proc contents for my two table:

1.png2.png3.png4.png

 

 

Here are my SQL Left Join code:

proc sql;
create table MergedHDF as
select coalesce(p.ProductYear,q.ProductYear)as ProductYear,
       p.MeasurementYear,
       coalesce(p.SubmissionId, q.SubmissionId)as SubmissionId,
       coalesce(p.OrganizationId,q.OrganizationId)as OrganizationId,
       coalesce(p.OrganizationName,q.OrganizationName) as OrganizationName,
       coalesce(p.ProductLine, q.ProductLine) as ProductLine,
       coalesce(p.ReportingProduct, q.ReportingProduct) as ReportingProduct,
       coalesce(p.MeasureCode,q.MeasureCode) as MeasureCode,
       p.MeasureName, p.IndicatorName, p.IndicatorCode,
	   p.Rate, p.Numerator, p.Denominator, p.EligiblePopulation, 
       p.Type, p.Region, p.sum_num, p.sum_denom,p.AggregateRate,
       coalesce(p.Status,q.Status) as Status,q.states 
from averagedrate as p
left join
combinedstates as q
on  p.organizationid=q.organizationid and p.SubmissionId=q.submissionId and 
    p.ProductLine=q.ProductLine and p.ReportingProduct=q.ReportingProduct;
quit;

proc sort data=mergedHDF nodupkey;
by measurecode productyear organizationid submissionid productline reportingproduct status;
run;

 

Here are my Log:

 5.png

 

 

Note that there are 5426 rows, rather than 2816 rows as in my left table.

Here are the log after I remove duplicates:

 

6.png

 

This really confuses me, any idea on why this happened will be greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
HB
Barite | Level 11 HB
Barite | Level 11

Because there are duplicates in your Q table.

 

data p;
	input 
	org submit product report uniquepone:$1. uniqueptwo:$1.;
datalines;
1 2 3 4 A B
2 2 2 2 C D
3 3 4 6 E F
;
run;

data q;
	input 
	org submit product report uniqueqone:$1. uniqueqtwo:$1.;
datalines;
1 2 3 4 G H
1 2 3 4 I J
2 2 2 2 K L
3 3 4 6 M N
3 3 4 6 O P
4 4 4 4 Q R
4 5 5 5 S T
;
run;


proc sql; 
	select p.org, p.submit, p.product, p.report, uniquepone, uniqueqtwo 
	from p left join q
	on p.org = q.org and p.submit = q.submit and p.product = q.product and p.report = q.report; 
quit;

You don't get three records.  You get five.

 

                                       The SAS System     14:45 Tuesday, November 28, 2017   4

                      org    submit   product    report  uniquepone  uniqueqtwo
 
                        1         2         3         4  A           H
                        1         2         3         4  A           J
                        2         2         2         2  C           L
                        3         3         4         6  E           P
                        3         3         4         6  E           N


View solution in original post

2 REPLIES 2
HB
Barite | Level 11 HB
Barite | Level 11

Because there are duplicates in your Q table.

 

data p;
	input 
	org submit product report uniquepone:$1. uniqueptwo:$1.;
datalines;
1 2 3 4 A B
2 2 2 2 C D
3 3 4 6 E F
;
run;

data q;
	input 
	org submit product report uniqueqone:$1. uniqueqtwo:$1.;
datalines;
1 2 3 4 G H
1 2 3 4 I J
2 2 2 2 K L
3 3 4 6 M N
3 3 4 6 O P
4 4 4 4 Q R
4 5 5 5 S T
;
run;


proc sql; 
	select p.org, p.submit, p.product, p.report, uniquepone, uniqueqtwo 
	from p left join q
	on p.org = q.org and p.submit = q.submit and p.product = q.product and p.report = q.report; 
quit;

You don't get three records.  You get five.

 

                                       The SAS System     14:45 Tuesday, November 28, 2017   4

                      org    submit   product    report  uniquepone  uniqueqtwo
 
                        1         2         3         4  A           H
                        1         2         3         4  A           J
                        2         2         2         2  C           L
                        3         3         4         6  E           P
                        3         3         4         6  E           N


LisaYIN9309
Obsidian | Level 7

Aha! I see, thank you @HB

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 13030 views
  • 3 likes
  • 2 in conversation