BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mystik
Obsidian | Level 7

Hello,

I have two datasets, namely;

 

Dataset1: with variables; ID    Evnop    Syst    Conyst    Contr    Metype    Steno    Epidem    Entrant    Adde 

 

Dataset2: with variables; ID    Entrant   Adde   Set1   Set2      Set3    Set4    Set5    Set6    Set7

 

I want to do FULL JOIN to join Dataset 1 to Dataset2 on variable "adde".

 

What will appropraite SAS proc SQL to use for the above?

 

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

You're missing some words.  

 

You need to specify "CREATE TABLE <table-to-create> AS".  And, you need to provide specific data set names for your source tables in the FROM clause.  I used Table1 and Table2 here, but you change it as needed.

 

proc sql;
 create table t12 AS
  select 
        coalesce(t1.ID,t2.ID) as ID,
        t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem, 
        coalesce(t1.Entrant,t2.Entrant) as Entrant, 
        coalesce(t1.Adde,t2.Adde) as Adde,
        t2.Set1, t2.Set2, t2.Set3, t2.Set4 t2.Set5,t2.Set6,t2.Set7
    from table1 t1 FULL JOIN table2 t2 on (t1.Adde = t2.Adde);
  quit;
Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.

View solution in original post

13 REPLIES 13
ChrisHemedinger
Community Manager

Full Join means that you want all records from both tables, right?  Since some of the fields have common names across the tables, you might want to use coalesce to make sure that you get all values.

 

proc sql;
 create table want as 
  select 
        coalesce(t1.ID,t2.ID) as ID,
        t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem, 
        coalesce(t1.Entrant,t2.Entrant) as Entrant, 
        coalesce(t1.Adde,t2.Adde) as Adde,
        t2.Set1, t2.Set2, t2.Set3, t2.Set4 t2.Set5,t2.Set6,t2.Set7
    from dataset1 t1 FULL JOIN dataset2 t2 on (t1.Adde = t2.Adde);
  quit;

Now...I'm not sure that's what you really want -- but you'll have to try it and see.

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
Mystik
Obsidian | Level 7

Thank you. Much appreciated. Yes, that's what i want.

However am getting some syntax errors

 

The dataset1 is my table1(t1) and dataset2 is my table2(t2) (just for clarification i mean to say table1 not dataset1)

 

This is what i put in sas;

proc sql;
 create table_12
  select 
        coalesce(t1.ID,t2.ID) as ID,
        t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem, 
        coalesce(t1.Entrant,t2.Entrant) as Entrant, 
        coalesce(t1.Adde,t2.Adde) as Adde,
        t2.Set1, t2.Set2, t2.Set3, t2.Set4 t2.Set5,t2.Set6,t2.Set7
    from t1 FULL JOIN t2 on (t1.Adde = t2.Adde);
  quit;

However, the coalesce is not highlighted. is it because of not putting the appropriate commas or semi-colon at the right places?

 

Also got these ERRORS in the SAS Log;

Syntax error, expecting one of the following: TABLE, VIEW

Syntax error, statement will be ignored.

NOTE: T he SAS system stopped processing this steps because of errors.

ChrisHemedinger
Community Manager

You're missing some words.  

 

You need to specify "CREATE TABLE <table-to-create> AS".  And, you need to provide specific data set names for your source tables in the FROM clause.  I used Table1 and Table2 here, but you change it as needed.

 

proc sql;
 create table t12 AS
  select 
        coalesce(t1.ID,t2.ID) as ID,
        t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem, 
        coalesce(t1.Entrant,t2.Entrant) as Entrant, 
        coalesce(t1.Adde,t2.Adde) as Adde,
        t2.Set1, t2.Set2, t2.Set3, t2.Set4 t2.Set5,t2.Set6,t2.Set7
    from table1 t1 FULL JOIN table2 t2 on (t1.Adde = t2.Adde);
  quit;
Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
Mystik
Obsidian | Level 7

Thank you. Made the necessary changes.

 

However, what is the difference between your t1 and Table1 or t2 or Table2?

This is because i have two files/tables (Table1 and Table2) which i want to Full Join.

ChrisHemedinger
Community Manager

t1 and t2 are aliases for the full table names.  This allows us to use a shorthand in the SELECT and other clauses.

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
Mystik
Obsidian | Level 7

Thank you.

However, PROC SQL running has been on for more than 20 minutes now.

Table1 have over 58million observations and Table2 have over 16million observation. Dunno if this is the cause.

ChrisHemedinger
Community Manager

That's definitely the cause -- your result will be 74 million+ rows.  It probably would have been wise to test this in smaller subsets to make sure it generates the results you want.  The INOBS= option on PROC SQL can limit the number of records used to input -- good for quicker tests.

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
Mystik
Obsidian | Level 7

Thank you. Wish we have more people of your kind to give us clear and succinct ideas on PROC SQL.

 

After run time of 1:03:09.14 (just over an hour), 

Table_12 was created with over 324 Million+ rows (yes over 300million) and 17 columns.

 

Please, what will be the appropriate INOBS= option on PROC SQL for my scenario for quicker tests?

 

ChrisHemedinger
Community Manager

Wow - there must have been some cartesian product interactions there -- where multiple combinations of records are created.  This indicates that your JOIN criteria is probably not narrow enough.

 

I suggest that you watch this free webinar from our Ask the Experts series: SQL: An Introduction.  It's probably just about 45 minutes long, and it will help you to understand these concepts.

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
Mystik
Obsidian | Level 7

That was a very helpful Webinar!

However, in the Webinar, the demonstration was for "inner join with summarization". Even though i clearly understood the cartesian product interactions in the video, there was little said about it and its prevention in the case of FULL JOIN(which is mine situation). 

 

Is there a way to use the summary function in the FULL JOIN to prevent full blown cartesian product interaction?

 

Is the cartesian product interaction bad or its just another way of having a complex table?

 

ChrisHemedinger
Community Manager

Usually when we talk about "cartesian product" it's in the context of a query that pulls too many combinations of records -- more than we want.  You could try adding more or different join criteria (in the ON clause) for to match maybe the ID (which is usually the name for a unique key, so might be closer to what you want).

 

If you want to try with a limited set of records, use PROC SQL INOBS=1000 (or some other relatively small number) to see if it's getting close to what you want. 

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
kiranv_
Rhodochrosite | Level 12

check whether you indexes on the column you are joining on. try building indexes on columns you join.

 

Peter_C
Rhodochrosite | Level 12
See http://support.sas.com/techsup/technote/ts553.html for a paper on understanding and optimising joins

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 53525 views
  • 3 likes
  • 4 in conversation