BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shailey
Obsidian | Level 7

Hello, I am trying to merge 4 tables and I would like all the final result to contain all of the values from each table, but when I try to do a full outer join it produces way too many results. The data are structured as shown below: table a has a unique id, a zip code, and a name. Table b has a zip code and a city, table c has a zip code and zcta, and table has a zcta and some data. 

 

 

data a; 
	input id zip name $; 
	cards; 
	1 15217 Bob
	2 15217 Sue
	3 15213 Jack
	4 85001 Ginger
	5 91111 Mary
	; 
run; 

data b; 
	input zip city $; 
	cards; 
	15217 Pittsburgh
	15213 Pittsburgh 
	60290 Chicago
	10001 New York 
	85001 Phoenix
	; 
run; 

data c; 
	input zip zcta; 
	cards; 
	15217 15217
	15213 15217
	51331 51331
	60290 60290
	10001 10002
	85001 85001 
; 
run; 

data d; 
	input zcta data $; 
	cards; 
15217 v1
51331 v2
60290 v3
10002 v4
85001 v5
95201 v6
; 
run; 

proc sql; 
	create table outer as 
		select 	a.id, a.name, a.zip, 
				b.city, 
				c.zcta, 
				d.data
		from 			a as a 
		full outer join 
						(b as b, 
						c as c, 
						d as d) 
		on 				a.zip = b.zip 
		and 			a.zip = c.zip
		and 			b.zip = c.zip 
		and 			c.zcta = d.zcta; 
quit; 

 

The result I'd like would be: 

 

	input id name $ zip zcta city $ data $ ; 
	cards; 
	1 Bob 15217 15217 Pittsburgh v1
	2 Sue 15217 15217 Pittsburgh v1
	3 Jack 15213 15217 Pittsburgh v1
	4 Ginger 85001 85001 Phoenix v5
	. . 10001 10002 NewYork v4
	. . 60290 60290 Chicago v3
	. . 51331 51331 . v2
	. . . 95201 . v6
	; 
run; 

Is there a straightforward way to do this? Thank you for your help! 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Try specifying it like below:

from tablea
full join tableb on (join condition)
full join tablec on (join condition)
full join tabled on (join condition);
quit;

View solution in original post

5 REPLIES 5
Reeza
Super User
Try specifying it like below:

from tablea
full join tableb on (join condition)
full join tablec on (join condition)
full join tabled on (join condition);
quit;
shailey
Obsidian | Level 7

Absolutely fabulous! Thank you so much for your insight. 

ballardw
Super User

Since all of these are only adding one variable, I'd say the easiest is to sort them all by Zip and then use a data step merge.

Or possibly create formats for ZCTA and whatever that DATA variable stands form . If the ZIP code is supposed to be the code for the City then the column city is not needed with the ZIPCity function available.

shailey
Obsidian | Level 7

Thank you for these ideas. Actually, the adding of a single variable each dataset was only for simplicity, there are (many) other variables to merge as well. But I do like the suggestion of using a zip2zcta format to create a new variable in a dataset that contains zip rather than merging using the crosswalk table (table c). 

hbi
Quartz | Level 8 hbi
Quartz | Level 8

Hi, 

 

I included three different variations of solutions because it is unclear why Mary is being excluded. The first version includes Mary, but the latter two do not include Mary. Hope this helps. 

 

hbi

 

 

/* this version includes Mary */ 
/* comment: from your example it is unclear why Mary should be excluded */
PROC SQL;
  CREATE TABLE want_version1 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;


/* this version excludes Mary */
/* reason for excluding Mary is because: d.zcta is missing */
PROC SQL;
  CREATE TABLE want_version2 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  WHERE d.zcta IS NOT MISSING
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;


/* this version excludes Mary */
/* reason for excluding Mary is because: d.data is missing */
PROC SQL;
  CREATE TABLE want_version3 AS 
  SELECT a.id
       , a.name
       , COALESCE(a.zip, b.zip, c.zip) as zip
       , b.city
       , COALESCE(c.zcta, d.zcta) as zcta
       , d.data
  FROM a
  FULL OUTER JOIN b ON (b.zip = a.zip)
  FULL OUTER JOIN c ON (c.zip = COALESCE(a.zip, b.zip))
  FULL OUTER JOIN d ON (d.zcta = c.zcta)
  WHERE d.data IS NOT MISSING
  ORDER BY 1, 2, 3, 4, 5, 6;
QUIT;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 39972 views
  • 5 likes
  • 4 in conversation