Hi All,
I need to know is there any way I can optimize this join. I am getting notes :
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. mywork.dates
A Cartesian join takes all records of data set 1 and merges them with all records in data set 2. If data set 1 has n1 records and data set 2 has n2 records, then the Cartesian join will have n1*n2 records (which may be reduced by a WHERE statement, but I think first all records are created and then those that don't meet the where condition are deleted). If n1 and n2 are large, this will take a long time.
The only way to "optimize" is if you can replace the Cartesian join with some other type of join. Maybe that's possible, maybe that's not possible, you tell us.
Which data set has the variable date_lvl_no?
Without example data and expected results we can't come up with much optimization and when you use just a comma between two sets that is the request for a Cartesian join. SAS provides a note about such in case that wasn't your intent.
from set1 , set2
You might be able to improve performance by sub-setting data earlier such as:
proc sql noprint; create table mywork.dates as select distinct a.prod_sk, a.geo_sk, b.date_sk, b.fiscal_start_dt, b.fiscal_end_dt from mywork.tmp_plan_actuals_pgr a, ( select * from mylib.date_nm where fiscal_start_dt >= &min_start_dt. and fiscal_end_dt <= &max_end_dt.) as b where date_lvl_no = &time_low_lvl.; quit;
Do the same about only selecting the records where date_lvl_no is desired from which set it comes from, which is why I asked.
This reduces the number of joins between the two sets which MAY significantly improve run time.
Caution: Is B.fiscal_end_dt ever missing? If it is missing then it is less than any value for &max_end_dt. Is that your intend or did you only want the fiscal_end_dt when not missing?
The only way to remove that note is a setting that will suppress all notes. So is not recommended. It is not an error. It is a reminder that such as join is in your code. Such Cartesian joins, by definition, join every record in one set with every record in the other set. That step cannot be optimized. Ever. Period.
If you are going to use Cartesian joins it is a good idea to keep a good grasp on your data set sizes and exactly how every element of the data is actually used to get reasonable performance out of everything else.
Are there no key columns that are in both input tables? Your query is missing a join between at least one column in each input table, so that's why you are getting the Cartesian product note.
Why do you need to improve performance when the query real time is less than 1 second? Are you getting the right result set?
Show examples of the data (does not need to be REAL data, just something that shows the right relationships).
If there really is no variables in common between the two tables then you cannot do anything other than a Cartesian product.
But you can fool PROC SQL into not writing the message by just adding a variable to each input and joining on it.
Here is trivial example you can try.
proc sql nowarn;
create table test as
select *
from (select distinct sex,1 as x from sashelp.class) a
, (select distinct age,1 as x from sashelp.class) b
;
create table test2(drop=x) as
select *
from (select distinct sex,1 as x from sashelp.class) a
full join (select distinct age,1 as x from sashelp.class) b
on a.x=b.x
;
quit;
NOTE that the variable date_lvl_no mentioned in your WHERE clause is not output and you do not say where it came from. You could make the problem easier by just excluding those observations from even being considered.
So if DATE_LVL_NO is coming from mylib.date_nm then your query looks like:
create table mywork.dates as
select distinct
a.prod_sk
, a.geo_sk
, b.date_sk
, b.fiscal_start_dt
, b.fiscal_end_dt
from (select prod_sk,geo_sk,1 as x from mywork.tmp_plan_actuals_pgr) a
full join
(select fiscal_start_dt,fiscal_end_dt,date_sk,1 as x
from mylib.date_nm
where fiscal_start_dt >= &min_start_dt.
and fiscal_end_dt <= &max_end_dt.
and date_lvl_no = &time_low_lvl.
) b
on a.x=b.x
;
It might or might not run any faster. But at least you won't get the NOTE in the LOG.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.