BookmarkSubscribeRSS Feed
Jarvin99
Obsidian | Level 7

Hi all, I face an error during fuzzy merging, and attached are my two sample datasets.

Below is my code:

proc sql;
create table try
as select a.gvkey, a.comnam, b.*, compged(a.comnam, b.name) as compged_score
	from com2021 as a, sp as b
	where calculated compged_score le 50
	order by b.name;
quit;

The merged data 'try' is supposed to contain all variables filled, so I am not sure why some variables are empty. The log shows " The execution of this query involves performing one or more Cartesian product joins that can not be optimized."

 

I am not sure how to solve the log. Pls helps!

3 REPLIES 3
ballardw
Super User

Look at your Com2021 data set. Lots of records have missing values for the Comnam variable. The first one for example.

 

The "not optimized" message is coming because you use

from com2021 as a, sp as b

That , says you a doing a Cartesian join. Every record in A with every record in B. There is no way to "optimize" such a forced join. This message belongs there.

 

When I run your code with those sets I get no records in the output as the smallest Compged_score is 110.

Jarvin99
Obsidian | Level 7

What if I do not restrict? Why my 'country' variable is empty after merge? 

Also, do you mean it is 1-to-1, so there is no optimization in this case?

Then, how should I do to obtain 'gvkey' from Com2021 for firms in Sp using firm names?

 

Thank you.

Reeza
Super User

@Jarvin99 wrote:

Also, do you mean it is 1-to-1, so there is no optimization in this case?


It's a one to all/many join.

Every record from table A is joined to every record in Table B. 

If you have 10 records in TableA and 20 records in TableB, there will 10*20 comparisons and 200 records generated if you do not filter the results. If in TableB, there are 3 empty rows, then there will be 10*3 = 30 empty records for the second variable in the data set. 

 

If you have a large data set this can be very computationally intensive. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 316 views
  • 0 likes
  • 3 in conversation