I have one unbalanced panel data set and am using SQL to make it balanced by (1) taking the Cartesian and then (2) left joining the original set. The following data set has TICKER and DATE as two indices. I Cartesian joined distinct TICKER and distinct DATE as follows and then tried to left join ADJUST to the Cartesian.
data yahoo;
input ticker $ @@;
i=cats("https://query1.finance.yahoo.com/v7/finance/download/",ticker,
'?period1=-999999999999&period2=999999999999&interval=1d');
infile j url filevar=i firstobs=2 dsd end=k;
do until(k);
input date yymmdd10. +1 open high low close adjust volume;
output;
end;
cards;
BA DIS KO
;
proc sql;
create table i as
select i.ticker,j.date,adjust
from (select distinct ticker from yahoo) i,
(select distinct date from yahoo) j
left join yahoo k on i.ticker=k.ticker & j.date=k.date
order by ticker,date;
quit;
And the code above prints a correlated reference error message as follows.
14 proc sql;
15 create table i as
16 select i.ticker,j.date,adjust
17 from (select distinct ticker from yahoo) i,
18 (select distinct date from yahoo) j
19 left join yahoo k on i.ticker=k.ticker & j.date=k.date
20 order by ticker,date;
ERROR: Correlated reference to column ticker is not contained
within a subquery.
21 quit;
NOTE: The SAS System stopped processing this step because of
errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
I reviewed this note but couldn't understand because my ON clause for LEFT JOIN has all the variables already introduced: I.TICKER, K.TICKER, J.DATE, and K.DATE. I wonder whether this can be done in one SQL query.
P.S. CROSS JOIN rather than just comma as follows works.
proc sql;
create table i as
select i.ticker,j.date,adjust
from (select distinct ticker from yahoo) i
cross join (select distinct date from yahoo) j
left join yahoo k on i.ticker=k.ticker & j.date=k.date
order by ticker,date;
quit;
I wonder whether a comma and CROSS JOIN differ in other cases.
I think it is a matter of order of precedence. Your second query, with the CROSS JOIN statement, first executes the CROSS JOIN and then the LEFT JOIN.
But I think the comma has a different precedence, so that PROC SQL first creates the TICKER (I) query, and then tries to evaluate the LEFT JOIN of the DATE (J) query with the whole table (K). So the LEFT JOIN is only against the J query, which has no reference to the I alias.
> I wonder whether a comma and CROSS JOIN differ in other cases.
A comma is similar to an INNER JOIN.
I think it is a matter of order of precedence. Your second query, with the CROSS JOIN statement, first executes the CROSS JOIN and then the LEFT JOIN.
But I think the comma has a different precedence, so that PROC SQL first creates the TICKER (I) query, and then tries to evaluate the LEFT JOIN of the DATE (J) query with the whole table (K). So the LEFT JOIN is only against the J query, which has no reference to the I alias.
I think this explains because the original code also works after adding parentheses before and after the two SELECT clauses as follows.
proc sql;
create table i as
select i.ticker,j.date,adjust
from ((select distinct ticker from yahoo) i,
(select distinct date from yahoo) j)
left join yahoo k on i.ticker=k.ticker & j.date=k.date
order by ticker,date;
quit;
It seems PROC SQL tries to process JOIN first and then the comma unless the parentheses. Thanks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.