BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shailey
Obsidian | Level 7

Hello, I am trying to merge 4 tables and I would like all the final result to contain all of the values from each table, but when I try to do a full outer join it produces way too many results. The data are structured as shown below: table a has a unique id, a zip code, and a name. Table b has a zip code and a city, table c has a zip code and zcta, and table has a zcta and some data. 

 

 

data a; 
	input id zip name $; 
	cards; 
	1 15217 Bob
	2 15217 Sue
	3 15213 Jack
	4 85001 Ginger
	5 91111 Mary
	; 
run; 

data b; 
	input zip city $; 
	cards; 
	15217 Pittsburgh
	15213 Pittsburgh 
	60290 Chicago
	10001 New York 
	85001 Phoenix
	; 
run; 

data c; 
	input zip zcta; 
	cards; 
	15217 15217
	15213 15217
	51331 51331
	60290 60290
	10001 10002
	85001 85001 
; 
run; 

data d; 
	input zcta data $; 
	cards; 
15217 v1
51331 v2
60290 v3
10002 v4
85001 v5
95201 v6
; 
run; 

proc sql; 
	create table outer as 
		select 	a.id, a.name, a.zip, 
				b.city, 
				c.zcta, 
				d.data
		from 			a as a 
		full outer join 
						(b as b, 
						c as c, 
						d as d) 
		on 				a.zip = b.zip 
		and 			a.zip = c.zip
		and 			b.zip = c.zip 
		and 			c.zcta = d.zcta; 
quit; 

 

The result I'd like would be: 

 

	input id name $ zip zcta city $ data $ ; 
	cards; 
	1 Bob 15217 15217 Pittsburgh v1
	2 Sue 15217 15217 Pittsburgh v1
	3 Jack 15213 15217 Pittsburgh v1
	4 Ginger 85001 85001 Phoenix v5
	. . 10001 10002 NewYork v4
	. . 60290 60290 Chicago v3
	. . 51331 51331 . v2
	. . . 95201 . v6
	; 
run; 

Is there a straightforward way to do this? Thank you for your help! 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Try specifying it like below:

from tablea
full join tableb on (join condition)
full join tablec on (join condition)
full join tabled on (join condition);
quit;

View solution in original post

5 REPLIES 5
Reeza
Super User
Try specifying it like below:

from tablea
full join tableb on (join condition)
full join tablec on (join condition)
full join tabled on (join condition);
quit;
shailey
Obsidian | Level 7

Absolutely fabulous! Thank you so much for your insight. 

ballardw
Super User

Since all of these are only adding one variable, I'd say the easiest is to sort them all by Zip and then use a data step merge.

Or possibly create formats for ZCTA and whatever that DATA variable stands form . If the ZIP code is supposed to be the code for the City then the column city is not needed with the ZIPCity function available.

shailey
Obsidian | Level 7

Thank you for these ideas. Actually, the adding of a single variable each dataset was only for simplicity, there are (many) other variables to merge as well. But I do like the suggestion of using a zip2zcta format to create a new variable in a dataset that contains zip rather than merging using the crosswalk table (table c). 

hbi
Quartz | Level 8 hbi
Quartz | Level 8

Hi, 

 

I included three different variations of solutions because it is unclear why Mary is being excluded. The first version includes Mary, but the latter two do not include Mary. Hope this helps. 

 

hbi

 

 

/* this version includes Mary */ 
/* comment: from your example it is unclear why Mary should be excluded */
PROC SQL;
  CREATE TABLE want_version1 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;


/* this version excludes Mary */
/* reason for excluding Mary is because: d.zcta is missing */
PROC SQL;
  CREATE TABLE want_version2 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  WHERE d.zcta IS NOT MISSING
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;


/* this version excludes Mary */
/* reason for excluding Mary is because: d.data is missing */
PROC SQL;
  CREATE TABLE want_version3 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  WHERE d.data IS NOT MISSING
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 40697 views
  • 5 likes
  • 4 in conversation