Solved: Re: PROC SQL Join error?

ybz12003 · Posted 05-23-2018 02:00 PM

Hi all:

I know it looks a little crazy and greedy. I was trying to use PROC SQL inner join for merge 12 tables in ONE based on their same ID. However, the code didn't work. Any suggestion? Thanks.

PROC SQL;
	CREATE TABLE test AS
	SELECT a.ID,
  a.Display_label,
  a.Crowdingpercent&year.,
  b.LowEducationpercent&year.,
  b.
.
. 
 l.AffordHousing_M&year.,
  l.AffordHousing_NoM&year.
	FROM crowding&year.  a inner JOIN
	education&year.  b inner JOIN
  .
.
.
.
.
  
  affordableHousing&year.  l
	ON a.ID = b.ID = c.ID = d.ID = e.ID = f.ID = g.ID = h.ID = i.ID = j.ID = k.ID = l.ID
	ORDER BY ID;
QUIT;

SuryaKiran · Posted 05-23-2018 02:09 PM

Check you syntax:

	FROM crowding&year.  a 
		inner JOIN education&year.  b ON a.ID = b.ID
		inner JOIN workercls&year.  c ON a.ID = c.ID
		inner JOIN occupation&year. d ON a.ID = d.ID 
		inner JOIN poverty&year.    e ON a.ID = e.ID
                ..............So on

Thanks,
Suryakiran

View solution in original post

SuryaKiran · Posted 05-23-2018 02:09 PM

Check you syntax:

	FROM crowding&year.  a 
		inner JOIN education&year.  b ON a.ID = b.ID
		inner JOIN workercls&year.  c ON a.ID = c.ID
		inner JOIN occupation&year. d ON a.ID = d.ID 
		inner JOIN poverty&year.    e ON a.ID = e.ID
                ..............So on

Thanks,
Suryakiran

ybz12003 · Posted 05-23-2018 03:20 PM

Thanks, Suryakiran! It works.

ybz12003 · Posted 05-23-2018 02:35 PM

Thanks. However, I found the ID are duplicated in the new combined table. I have to add NODUPKEY step to fix it. Is there a way to just do it in SQL step? Thanks.

mkeintz · Posted 05-23-2018 02:42 PM

Then at least one of your initial tables must have a duplicate ID, right? It might be better to first eliminate duplicates from the contributing table if you can come up with a criterion that doesn't need data from other tables.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SuryaKiran · Posted 05-23-2018 02:49 PM

You can remove duplicated by DISTINCT keyword in sub-query. Make sure you have all the columns selected in sub-query that are referenced in main query.

FROM crowding&year.  a 
		inner JOIN ( select distinct Display_label,Crowdingpercent&year.
                                   FROM education&year.) b ON a.ID = b.ID 
                inner JOIN workercls&year. c ON a.ID = c.ID 
                inner JOIN occupation&year. d ON a.ID = d.ID 
                inner JOIN poverty&year. e ON a.ID = e.ID ..............So on

Thanks,
Suryakiran

mkeintz · Posted 05-23-2018 03:16 PM

@SuryaKiran:

I believe your suggestion would generate an

"ERROR: Column ID could not be found in the table/view identified with the correlation name b."

message, because the sub-query doesn't include ID.

And if the subquery did include ID,as in

(select distinct ID, Display_label,Crowdingpercent&year. FROM education&year.)

the OP could still get duplicate ID's in the resulting table, unless it is known that all instances of duplicate ID's in the education&year table will also have duplicate Display_label,Crowdingpercent&year values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SuryaKiran · Posted 05-23-2018 03:28 PM

@mkeintz Your right. If you see my note I mentioned Make sure you have all the columns selected in sub-query that are referenced in main query.

Thanks,
Suryakiran

Reeza · Posted 05-23-2018 03:02 PM

Could you append the data sets instead, add in a year column and then use TRANSPOSE to create the giant wide data set you want? The wide data set would be painful to work with anyways, so in general a vertical stacked data set may be preferable.

@ybz12003 wrote:

Hi all:

I know it looks a little crazy and greedy. I was trying to use PROC SQL inner join for merge 12 tables in ONE based on their same ID. However, the code didn't work. Any suggestion? Thanks.

PROC SQL;
	CREATE TABLE test AS
	SELECT a.ID,
  a.Display_label,
  a.Crowdingpercent&year.,
  b.LowEducationpercent&year.,
  b.HighEducationpercent&year.,
  c.LowworkclsPercent&year.,
  d.Occupationpercent&year.,
  e.PovertyPercent&year.,
  f.RentalOccupancyPercent&year.,
  g.WealthHomepercent&year.,
  h.Unemploypercent&year.,
  i.MedianIncome&year.,
  j.Language2ndPercent&year.,
  k.White__P&year.,
  k.AA_P&year.,
  k.AIAN__P&year.,
  k.__P&year.,
  k.PI_P&year.,
  k.Other__P&year.,
  k.TwoPlus_P&year.,
  l.AffordHousing_M&year.,
  l.AffordHousing_NoM&year.
	FROM crowding&year.  a inner JOIN
	education&year.  b inner JOIN
  workercls&year.  c inner JOIN
  occupation&year.  d inner JOIN
  poverty&year.  e inner JOIN
  rentaloccupancy&year.  f inner JOIN
  wealthhome&year.  g inner JOIN
  unemploy&year.  h inner JOIN
  medianincome&year.  i inner JOIN
  language&year.  j inner JOIN
  race&year.  k inner JOIN
  affordableHousing&year.  l
	ON a.ID = b.ID = c.ID = d.ID = e.ID = f.ID = g.ID = h.ID = i.ID = j.ID = k.ID = l.ID
	ORDER BY ID;
QUIT;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!