DATA Step, Macro, Functions and more

proc sql full outer join multiple tables with missing values

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

proc sql full outer join multiple tables with missing values

[ Edited ]

Hello, I am trying to merge 4 tables and I would like all the final result to contain all of the values from each table, but when I try to do a full outer join it produces way too many results. The data are structured as shown below: table a has a unique id, a zip code, and a name. Table b has a zip code and a city, table c has a zip code and zcta, and table has a zcta and some data. 

 

 

data a; 
	input id zip name $; 
	cards; 
	1 15217 Bob
	2 15217 Sue
	3 15213 Jack
	4 85001 Ginger
	5 91111 Mary
	; 
run; 

data b; 
	input zip city $; 
	cards; 
	15217 Pittsburgh
	15213 Pittsburgh 
	60290 Chicago
	10001 New York 
	85001 Phoenix
	; 
run; 

data c; 
	input zip zcta; 
	cards; 
	15217 15217
	15213 15217
	51331 51331
	60290 60290
	10001 10002
	85001 85001 
; 
run; 

data d; 
	input zcta data $; 
	cards; 
15217 v1
51331 v2
60290 v3
10002 v4
85001 v5
95201 v6
; 
run; 

proc sql; 
	create table outer as 
		select 	a.id, a.name, a.zip, 
				b.city, 
				c.zcta, 
				d.data
		from 			a as a 
		full outer join 
						(b as b, 
						c as c, 
						d as d) 
		on 				a.zip = b.zip 
		and 			a.zip = c.zip
		and 			b.zip = c.zip 
		and 			c.zcta = d.zcta; 
quit; 

 

The result I'd like would be: 

 

	input id name $ zip zcta city $ data $ ; 
	cards; 
	1 Bob 15217 15217 Pittsburgh v1
	2 Sue 15217 15217 Pittsburgh v1
	3 Jack 15213 15217 Pittsburgh v1
	4 Ginger 85001 85001 Phoenix v5
	. . 10001 10002 NewYork v4
	. . 60290 60290 Chicago v3
	. . 51331 51331 . v2
	. . . 95201 . v6
	; 
run; 

Is there a straightforward way to do this? Thank you for your help! 


Accepted Solutions
Solution
‎11-04-2015 03:29 PM
Super User
Posts: 19,822

Re: proc sql full outer join multiple tables with missing values

Try specifying it like below:

from tablea
full join tableb on (join condition)
full join tablec on (join condition)
full join tabled on (join condition);
quit;

View solution in original post


All Replies
Solution
‎11-04-2015 03:29 PM
Super User
Posts: 19,822

Re: proc sql full outer join multiple tables with missing values

Try specifying it like below:

from tablea
full join tableb on (join condition)
full join tablec on (join condition)
full join tabled on (join condition);
quit;
Contributor
Posts: 23

Re: proc sql full outer join multiple tables with missing values

Absolutely fabulous! Thank you so much for your insight. 

Super User
Posts: 11,343

Re: proc sql full outer join multiple tables with missing values

Since all of these are only adding one variable, I'd say the easiest is to sort them all by Zip and then use a data step merge.

Or possibly create formats for ZCTA and whatever that DATA variable stands form . If the ZIP code is supposed to be the code for the City then the column city is not needed with the ZIPCity function available.

Contributor
Posts: 23

Re: proc sql full outer join multiple tables with missing values

Thank you for these ideas. Actually, the adding of a single variable each dataset was only for simplicity, there are (many) other variables to merge as well. But I do like the suggestion of using a zip2zcta format to create a new variable in a dataset that contains zip rather than merging using the crosswalk table (table c). 

Contributor hbi
Contributor
Posts: 66

Re: proc sql full outer join multiple tables with missing values

Hi, 

 

I included three different variations of solutions because it is unclear why Mary is being excluded. The first version includes Mary, but the latter two do not include Mary. Hope this helps. 

 

hbi

 

 

/* this version includes Mary */ 
/* comment: from your example it is unclear why Mary should be excluded */
PROC SQL;
  CREATE TABLE want_version1 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;


/* this version excludes Mary */
/* reason for excluding Mary is because: d.zcta is missing */
PROC SQL;
  CREATE TABLE want_version2 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  WHERE d.zcta IS NOT MISSING
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;


/* this version excludes Mary */
/* reason for excluding Mary is because: d.data is missing */
PROC SQL;
  CREATE TABLE want_version3 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  WHERE d.data IS NOT MISSING
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 5673 views
  • 3 likes
  • 4 in conversation