Hello,
I have two datasets, namely;
Dataset1: with variables; ID Evnop Syst Conyst Contr Metype Steno Epidem Entrant Adde
Dataset2: with variables; ID Entrant Adde Set1 Set2 Set3 Set4 Set5 Set6 Set7
I want to do FULL JOIN to join Dataset 1 to Dataset2 on variable "adde".
What will appropraite SAS proc SQL to use for the above?
Thank you.
You're missing some words.
You need to specify "CREATE TABLE <table-to-create> AS". And, you need to provide specific data set names for your source tables in the FROM clause. I used Table1 and Table2 here, but you change it as needed.
proc sql;
create table t12 AS
select
coalesce(t1.ID,t2.ID) as ID,
t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem,
coalesce(t1.Entrant,t2.Entrant) as Entrant,
coalesce(t1.Adde,t2.Adde) as Adde,
t2.Set1, t2.Set2, t2.Set3, t2.Set4 t2.Set5,t2.Set6,t2.Set7
from table1 t1 FULL JOIN table2 t2 on (t1.Adde = t2.Adde);
quit;
Full Join means that you want all records from both tables, right? Since some of the fields have common names across the tables, you might want to use coalesce to make sure that you get all values.
proc sql;
create table want as
select
coalesce(t1.ID,t2.ID) as ID,
t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem,
coalesce(t1.Entrant,t2.Entrant) as Entrant,
coalesce(t1.Adde,t2.Adde) as Adde,
t2.Set1, t2.Set2, t2.Set3, t2.Set4 t2.Set5,t2.Set6,t2.Set7
from dataset1 t1 FULL JOIN dataset2 t2 on (t1.Adde = t2.Adde);
quit;
Now...I'm not sure that's what you really want -- but you'll have to try it and see.
Thank you. Much appreciated. Yes, that's what i want.
However am getting some syntax errors
The dataset1 is my table1(t1) and dataset2 is my table2(t2) (just for clarification i mean to say table1 not dataset1)
This is what i put in sas;
proc sql;
create table_12
select
coalesce(t1.ID,t2.ID) as ID,
t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem,
coalesce(t1.Entrant,t2.Entrant) as Entrant,
coalesce(t1.Adde,t2.Adde) as Adde,
t2.Set1, t2.Set2, t2.Set3, t2.Set4 t2.Set5,t2.Set6,t2.Set7
from t1 FULL JOIN t2 on (t1.Adde = t2.Adde);
quit;
However, the coalesce is not highlighted. is it because of not putting the appropriate commas or semi-colon at the right places?
Also got these ERRORS in the SAS Log;
Syntax error, expecting one of the following: TABLE, VIEW
Syntax error, statement will be ignored.
NOTE: T he SAS system stopped processing this steps because of errors.
You're missing some words.
You need to specify "CREATE TABLE <table-to-create> AS". And, you need to provide specific data set names for your source tables in the FROM clause. I used Table1 and Table2 here, but you change it as needed.
proc sql;
create table t12 AS
select
coalesce(t1.ID,t2.ID) as ID,
t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem,
coalesce(t1.Entrant,t2.Entrant) as Entrant,
coalesce(t1.Adde,t2.Adde) as Adde,
t2.Set1, t2.Set2, t2.Set3, t2.Set4 t2.Set5,t2.Set6,t2.Set7
from table1 t1 FULL JOIN table2 t2 on (t1.Adde = t2.Adde);
quit;
Thank you. Made the necessary changes.
However, what is the difference between your t1 and Table1 or t2 or Table2?
This is because i have two files/tables (Table1 and Table2) which i want to Full Join.
t1 and t2 are aliases for the full table names. This allows us to use a shorthand in the SELECT and other clauses.
Thank you.
However, PROC SQL running has been on for more than 20 minutes now.
Table1 have over 58million observations and Table2 have over 16million observation. Dunno if this is the cause.
That's definitely the cause -- your result will be 74 million+ rows. It probably would have been wise to test this in smaller subsets to make sure it generates the results you want. The INOBS= option on PROC SQL can limit the number of records used to input -- good for quicker tests.
Thank you. Wish we have more people of your kind to give us clear and succinct ideas on PROC SQL.
After run time of 1:03:09.14 (just over an hour),
Table_12 was created with over 324 Million+ rows (yes over 300million) and 17 columns.
Please, what will be the appropriate INOBS= option on PROC SQL for my scenario for quicker tests?
Wow - there must have been some cartesian product interactions there -- where multiple combinations of records are created. This indicates that your JOIN criteria is probably not narrow enough.
I suggest that you watch this free webinar from our Ask the Experts series: SQL: An Introduction. It's probably just about 45 minutes long, and it will help you to understand these concepts.
That was a very helpful Webinar!
However, in the Webinar, the demonstration was for "inner join with summarization". Even though i clearly understood the cartesian product interactions in the video, there was little said about it and its prevention in the case of FULL JOIN(which is mine situation).
Is there a way to use the summary function in the FULL JOIN to prevent full blown cartesian product interaction?
Is the cartesian product interaction bad or its just another way of having a complex table?
Usually when we talk about "cartesian product" it's in the context of a query that pulls too many combinations of records -- more than we want. You could try adding more or different join criteria (in the ON clause) for to match maybe the ID (which is usually the name for a unique key, so might be closer to what you want).
If you want to try with a limited set of records, use PROC SQL INOBS=1000 (or some other relatively small number) to see if it's getting close to what you want.
check whether you indexes on the column you are joining on. try building indexes on columns you join.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.