BookmarkSubscribeRSS Feed
claremc
Obsidian | Level 7

Hello, 

I am using gedscore to join two datasets by name. (I am really trying to do a fuzzy join to look for duplicate names between the two datasets). I keep getting the error " NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized" 

 

Here is my code: 

%let maxscore=201;
proc sql;
create table survey_trackingdups as
select a.* , b.*,
compged(a.name2,b.name1,&maxscore,'iL' ) as gedscore
from tracking as a, survey as b
where calculated gedscore < &maxscore
order by calculated gedscore;
quit;

I previously formatted both datasets to have the same format for name1 and name2. 

 

Any idea what is going on with the join? This code has worked for me before while joining other datasets. I have looked into this error and can't find any helpful solutions on other community blog posts.

 

Thanks, 

Clare

1 REPLY 1
ballardw
Super User

@claremc wrote:

Hello, 

I am using gedscore to join two datasets by name. (I am really trying to do a fuzzy join to look for duplicate names between the two datasets). I keep getting the error " NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized" 

 

Here is my code: 

%let maxscore=201;
proc sql;
create table survey_trackingdups as
select a.* , b.*,
compged(a.name2,b.name1,&maxscore,'iL' ) as gedscore
from tracking as a, survey as b
where calculated gedscore < &maxscore
order by calculated gedscore;
quit;

I previously formatted both datasets to have the same format for name1 and name2. 

 

Any idea what is going on with the join? This code has worked for me before while joining other datasets. I have looked into this error and can't find any helpful solutions on other community blog posts.

 

Thanks, 

Clare


 

This in not an error: " NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized". SAS will say ERROR when it is an error. This is telling that the behavior of the code might have some behavior you didn't intend.

It just means that the SAS can't find a "nicer" way to combine all the records to get what you want. Cartesian joins, combining every record from one set with every record in another can create large data sets and long run times. So SAS is telling you that may happen with this code. Two data sets of 1million records each used this way would create 1 trillion comparisons.

 

This occurs because of the way you requested combinations of all records and then apply the where to the results. ALL of the combinations have to be considered to apply the where

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 329 views
  • 0 likes
  • 2 in conversation