DATA Step, Macro, Functions and more

proc sql: how to do full join, keep all columns and rows from these two datasets

Reply
Super Contributor
Posts: 336

proc sql: how to do full join, keep all columns and rows from these two datasets

[ Edited ]
proc sql;
 create table want as
 select date, dea (dec.id=tap.id) as indic,
 (dec.id ~=".") as decid,
 (tap.id ~=".") as tapid,
 coalesce (dec.id, tap.id) as id
 from dec full join tap on dec.id=tap.id;
 quit;

I have two datasets that I want to merge. dataset dec has 240000 obs, dataset tap has 14000 obs. They do have 13000 in common.

 

I want to merge these two. My above code can generate a table, but cannot keep all these columns that I need.

I can keep all columns and all obs, that is 24000. 

I have 40 varibles, that I prefer to keep 15 variables of these 40 from dataset tap , and want to keep 5 varibles from dec. So it is not wise to type. Among them, some IDs are common.

Can anybody give advice?

Thanks.

PROC Star
Posts: 1,561

Re: proc sql: how to do full join, keep all columns and rows from these two datasets

[ Edited ]

If you use SQL, you have no option but to type each variable name, unless you just use *.

 

An option around this constraint is to use the keep= data set option to reduce the number of variables.

keep= allows you to use the colon, dash and double-dash list shortcuts such as    VAR:   or   VAR1-VAR10    or   VARA--ID .

 

If you use a data step merge instead of SQL, you have more flexibility in terms of implicitly naming variables, and you can monitor and count which observations for which table are kept or dropped.

 

 

Super Contributor
Posts: 336

Re: proc sql: how to do full join, keep all columns and rows from these two datasets

[ Edited ]

Thanks. But the problem is, I want to keep COALESCE function, and I want to keep all these 240000 obs

if I just use a regular full join, I do not know how to add coalesce funtion,

when i use

on a.ID=b.ID, I am afriad that I lose obs

 

PROC Star
Posts: 1,561

Re: proc sql: how to do full join, keep all columns and rows from these two datasets

This reply is very confusing:

-  the coalesce function is not used for the join

- you already use   on a.id = b.id

Please provide an example with 5-record input tables and the expected output.

 

Ask a Question
Discussion stats
  • 3 replies
  • 342 views
  • 2 likes
  • 2 in conversation