BookmarkSubscribeRSS Feed
trevand
Obsidian | Level 7

I am using the following code to do a proc sql join:

 

proc sql;
create table data3 as select t1.*, t2*, 
coalesce (t1. join_id t2.join_id) as join_id,
case 
when not missing (t1.join_id) and not missing (t2.join_id) then 3
when       missing (t1.join_id) and not missing (t2.join_id) then 2
when not missing (t1.join_id) and       missing (t2.join_id) then 1
end as join_stat
from data1 t1 full join data2 on t1.join_id=t2.join_id;
quit;

However I am getting a lot of missing  observations in the join_id after the join which should not happen since join_id is not missing in data1 and data2. It has something to do with t1*, t2*. I get the same problem if I do select *. Is it not appropriate to select all variables in proc sql joins? Or is there a different way to do so?

 

9 REPLIES 9
ballardw
Super User

Suggestion: copy the code from your LOG along with all the notes and messages.

 

The code you show has syntax problems.

reate table data3 as select t1.*, t2*, 

The t2* would throw an error about expecting something to follow the * as the dot after t2 is missing

coalesce (t1. join_id t2.join_id) as join_id,

has a space between t1. and join_id and behavior could be problematic and coalesce expects commas between values.

The Log will tell us the actual code submitted and if the issues I see are part of your problem.

For example if the log doesn't show a problem with the coalesce it is possible that t1. was treated as a missing variable (unless you have a variable by that name), join_id without a table name would likely throw an ambiguous reference note and if t2.join_id is missing then all three values could be treated as missing.

When you join on variables both have to have the same value so 'coalese (t1.join_id, t2_join_id) would be the value of t1.join_id regardless.

If you expect to see t1.join_id and t2.join_id with different values you can't join on them being equal.

Did you mean a "Full OUTER Join" which is bit different?

 

What selecting all variables from two data sets will do is report multiple variables of the same name with only one in the output.

An example. Since I am joining a set with itself all of the variables are in the output.

2481  proc sql;
2482     create table junk as
2483     select t1.*, t2.*
2484     from sashelp.class as t1
2485          full join
2486          sashelp.class as t2
2487          on t1.name = t2.name
2488     ;
WARNING: Variable Name already exists on file USER.JUNK.
WARNING: Variable Sex already exists on file USER.JUNK.
WARNING: Variable Age already exists on file USER.JUNK.
WARNING: Variable Height already exists on file USER.JUNK.
WARNING: Variable Weight already exists on file USER.JUNK.
NOTE: Table USER.JUNK created, with 19 rows and 5 columns.

2489  quit;
trevand
Obsidian | Level 7

Fair point. I cannot copy paste my code. So I had to type it and had typos.

Patrick
Opal | Level 21

@trevand wrote:

Fair point. I cannot copy paste my code. So I had to type it and had typos.


I encountered such copy/paste restrictions in the past connecting to a SAS managed (hosted) environment. ...until someone showed me that all that was missing was some change of the default client setting... 

For your question: Might be worth to share some representative sample data and show us the desired outcome. I've created such sample data below. If doing a full join one of the questions are: What is the relationship between the two tables? Can there be many:many relationships and what do you want to happen in such cases? What's the desired result?

data have1;
  input join_id h1;
  datalines;
1 1
2 1
3 1
3 2
4 1
4 2
;
data have2;
  input join_id h2;
  datalines;
1 1
3 1
3 2
5 1
5 2
;

proc sql;
/*  create table data3 as */
  select 
    coalesce(t1.join_id, t2.join_id) as join_id,
    t1.join_id as t1_join_id,
    t2.join_id as t2_join_id,
  case 
    when not missing (t1.join_id) and not missing (t2.join_id) then 3
    when       missing (t1.join_id) and not missing (t2.join_id) then 2
    when not missing (t1.join_id) and       missing (t2.join_id) then 1
else 0 end as join_stat, t1.h1, t2.h2 from have1 t1 full join have2 t2 on t1.join_id=t2.join_id ; quit;

Patrick_0-1726704703290.png

 

 

Tom
Super User Tom
Super User

A SAS dataset cannot have multiple variable with the same name so doing something like:

select a.*,b.*,coaleasce(a.id,b.id) as id
from a full join b
on a.id = b.id
;

Will try to select THREE variables that are all named ID.

PROC SQL will only keep the FIRST one.

So if you did want to do your join you need to move those t1.* and t2.* to the END of the list of variables.

 

trevand
Obsidian | Level 7

I'm doing proc sql full join because I want to have merged and non merged observations and proc sql is faster than data step merge. I see your point. If I put t1.* and t2.* at the end of variable list I get the same problem. I guess you can't select all variables if you also reference some of those variables in coalesce.

Tom
Super User Tom
Super User

@trevand wrote:

I'm doing proc sql full join because I want to have merged and non merged observations and proc sql is faster than data step merge. I see your point. If I put t1.* and t2.* at the end of variable list I get the same problem. I guess you can't select all variables if you also reference some of those variables in coalesce.


Huh?  If you coalesce the ID variables as the FIRST variable in the dataset then the original variables (which will show up later in the list of variables) will not be saved, but you don't need those since you have your status flag variable you calculated.

 

Are you saying there are other variables (besides the id variable) that have the same name in both datasets?   If so you can use RENAME= dataset options on the source datasets to give them unique names.

Example:

from data1(rename=(commonvar=commonvar1)) A
full join data2(rename=(commonvar=commonvar2)) B
Tom
Super User Tom
Super User

Can you explain what you are trying to do?

Do you expect to have multiple observations per value of JOIN_ID in BOTH of those datasets?

If not then forget the SQL and just do normal SAS code.

data data3;
  merge data1(in=in1) data2(in=in2);
  by join_id;
  join_stat = in1+2*in2;
run;
trevand
Obsidian | Level 7

Exactly, this is one of the problems. I can have multiple values per join_id. I tried again to put t1.* and t2.* at the end of variable list or even after the case I still get some of the observations missing.

 

proc sql;
create table data3 as  select 
coalesce(t1.join_id, t2.join_id) as join_id,   t1.*,  t2.*
case 
when not missing (t1.join_id) and not missing (t2.join_id) then 3
when       missing (t1.join_id) and not missing (t2.join_id) then 2
when not missing (t1.join_id) and       missing (t2.join_id) then 1    else 0
 end  as join_stat,
from data1 full join data2  on t1.join_id=t2.join_id;
quit;

 

or

 

proc sql;
create table data3 as  select 
  coalesce(t1.join_id, t2.join_id) as join_id,
  case 
   when not missing (t1.join_id) and not missing (t2.join_id) then 3
   when       missing (t1.join_id) and not missing (t2.join_id) then 2
   when not missing (t1.join_id) and       missing (t2.join_id) then 1    else 0
  end  as join_stat,
  t1.*,  t2.*
  from data1 full join data2  on t1.join_id=t2.join_id;
quit;
Patrick
Opal | Level 21

@trevand DO NOT use the asterix if you have same named variables in more than one of the source tables. Either rename the variables in source before the SQL join or even better list them explicitly in the SQL Select statement and give the variables unique names.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 490 views
  • 2 likes
  • 4 in conversation