07-27-2017 07:02 AM
I have two data sets both data sets have a cust_id variable, and am trying bring all the information from both the data sets.
input cust_id debit_acct_number name$ dc_open_date dc_balance;
informat dc_open_date date9.;
format dc_open_date date9.;
111 123456 anand 12jun2000 1000
222 234567 srikant 20sep2013 2000
444 456789 prathap 20mar2012 5000
333 345678 vishwa 29aug2013 7500
input cust_id credit_acct_number cc_open_date cc_balance;
informat cc_open_date date9.;
format cc_open_date date9.;
444 444444 01jul2016 27000
555 555555 20jun2016 33000
111 111111 01jun2015 23450
Below is my code.
create table test as
select * from debit_card_customers as a
full join credit_card_customers as b
I want to bring all the information from both the data sets, but in my resulting data set cust_id=555 is missing.Please suggest
Thanks & regards,
07-27-2017 07:43 AM - edited 07-27-2017 07:48 AM
You are obviously not reading the log (or not interpreting it correctly), as the log contains a clear pointer to the problem:
51 select * from debit_card_customers as a 52 full join credit_card_customers as b 53 on a.cust_id=b.cust_id; WARNING: Variable cust_id existiert bereits in Datei WORK.TEST.
(You'll have the same WARNING in English).
When you use the asterisk, and have variables common to both datasets, one of them has to take precedence. By default, it is the first one encountered, so you only get a.cust_id in the output. Since there is no cust_id = 555 present in dataset debit_card_customers (a), you get a missing value.
Moral: don't use the asterisk in SQL select statements, but exhaustive lists of the variables you want to keep.
And make use of the coalesce() function:
proc sql; create table test as select coalesce(a.cust_id,b.cust_id) as cust_id, debit_acct_number, name, dc_open_date, dc_balance, credit_acct_number, cc_open_date, cc_balance from debit_card_customers as a full join credit_card_customers as b on a.cust_id=b.cust_id; quit;
And with a 1:1 relationship, as you have here, a dats step solution will be more appropriate:
proc sort data=debit_card_customers; by cust_id; run; proc sort data=credit_card_customers; by cust_id; run; data test2; merge debit_card_customers credit_card_customers ; by cust_id; run;