06-14-2016 05:06 PM - edited 06-14-2016 05:18 PM
proc sql; create table want as select date, dea (dec.id=tap.id) as indic, (dec.id ~=".") as decid, (tap.id ~=".") as tapid, coalesce (dec.id, tap.id) as id from dec full join tap on dec.id=tap.id; quit;
I have two datasets that I want to merge. dataset dec has 240000 obs, dataset tap has 14000 obs. They do have 13000 in common.
I want to merge these two. My above code can generate a table, but cannot keep all these columns that I need.
I can keep all columns and all obs, that is 24000.
I have 40 varibles, that I prefer to keep 15 variables of these 40 from dataset tap , and want to keep 5 varibles from dec. So it is not wise to type. Among them, some IDs are common.
Can anybody give advice?
06-14-2016 07:13 PM - edited 06-14-2016 08:17 PM
If you use SQL, you have no option but to type each variable name, unless you just use *.
An option around this constraint is to use the keep= data set option to reduce the number of variables.
keep= allows you to use the colon, dash and double-dash list shortcuts such as VAR: or VAR1-VAR10 or VARA--ID .
If you use a data step merge instead of SQL, you have more flexibility in terms of implicitly naming variables, and you can monitor and count which observations for which table are kept or dropped.
06-15-2016 10:01 AM - edited 06-15-2016 10:02 AM
Thanks. But the problem is, I want to keep COALESCE function, and I want to keep all these 240000 obs
if I just use a regular full join, I do not know how to add coalesce funtion,
when i use
on a.ID=b.ID, I am afriad that I lose obs
06-15-2016 07:08 PM
This reply is very confusing:
- the coalesce function is not used for the join
- you already use on a.id = b.id
Please provide an example with 5-record input tables and the expected output.