Proc sql full join

Reply
Frequent Contributor
Posts: 117

Proc sql full join

Hi Experts,

 

 I have two data sets both data sets have a cust_id  variable, and am trying bring all the information from both the data sets.


data debit_card_customers;
input cust_id debit_acct_number name$ dc_open_date dc_balance;
informat dc_open_date date9.;
format dc_open_date date9.;
datalines;
111 123456 anand 12jun2000 1000
222 234567 srikant 20sep2013 2000
444 456789 prathap 20mar2012 5000
333 345678 vishwa 29aug2013 7500
;
run;

 

data credit_card_customers;
input cust_id credit_acct_number cc_open_date cc_balance;
informat cc_open_date date9.;
format cc_open_date date9.;
datalines;
444 444444 01jul2016 27000
555 555555 20jun2016 33000
111 111111 01jun2015 23450
;
run;

 

Below is my code.


proc sql;
create table test as
select * from debit_card_customers as a
full join credit_card_customers as b
on a.cust_id=b.cust_id;
quit;

 

 

I want to bring  all the information from both the data sets, but in my resulting data set cust_id=555 is missing.Please suggest

 

Thanks & regards,

Sanjay

 

Super User
Posts: 7,853

Re: Proc sql full join

[ Edited ]

You are obviously not reading the log (or not interpreting it correctly), as the log contains a clear pointer to the problem:

51         select * from debit_card_customers as a
52         full join credit_card_customers as b
53         on a.cust_id=b.cust_id;
WARNING: Variable cust_id existiert bereits in Datei WORK.TEST.

(You'll have the same WARNING in English).

When you use the asterisk, and have variables common to both datasets, one of them has to take precedence. By default, it is the first one encountered, so you only get a.cust_id in the output. Since there is no cust_id = 555 present in dataset debit_card_customers (a), you get a missing value.

Moral: don't use the asterisk in SQL select statements, but exhaustive lists of the variables you want to keep.

And make use of the coalesce() function:

proc sql;
create table test as
select 
  coalesce(a.cust_id,b.cust_id) as cust_id,
  debit_acct_number,
  name,
  dc_open_date,
  dc_balance,
  credit_acct_number,
  cc_open_date,
  cc_balance
from debit_card_customers as a
full join credit_card_customers as b
on a.cust_id=b.cust_id;
quit;

 And with a 1:1 relationship, as you have here, a dats step solution will be more appropriate:

proc sort data=debit_card_customers;
by cust_id;
run;

proc sort data=credit_card_customers;
by cust_id;
run;

data test2;
merge
  debit_card_customers
  credit_card_customers
;
by cust_id;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 1 reply
  • 78 views
  • 0 likes
  • 2 in conversation