Solved: LEft joining same tables multiple times

bhca60 · Posted 02-23-2024 02:35 PM

I have some archived tables that need to be brought in and joined on a list of ids. For the SELECT statement, do I need to do b*, c*, d*? or will b* suffice?

PROC SQL;
     CREATE TABLE work.PRE1  AS
        SELECT b.*
        from tko.list as a
left join abr.table1_20 as b
on a.id=b.id
left join  abr.table1_21 as c
on a.id=c.id
left join  abr.table1_22 as d
on a.id=d.id
;
quit;

SASKiwi · Posted 02-26-2024 06:29 PM

I suspect this logic will get you closer to what you want:

PROC SQL;
 CREATE TABLE work.PRE1 AS

 SELECT b.*
 from tko.list as a

left join 
(select *
 from abr.table1_20
 union 
 select * 
 from abr.table1_21
 union
 select *
 from abr.table1_22
) as b
on a.id=b.id
;
quit;

View solution in original post

mkeintz · Posted 02-23-2024 02:54 PM

Your subject line says (emphasis mine) "

LEft joining same tables multiple times."

But you aren't joining a table to itself. In what way are these the "same" tables?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

bhca60 · Posted 02-23-2024 03:12 PM

Same meaning they have the same columns. Thats why I was just doing b*.

PaigeMiller · Posted 02-23-2024 03:13 PM

@bhca60 wrote:

Same meaning they have the same columns. Thats why I was just doing b*.

Then don't you want to do a UNION and not a left join?

And again, please note you want b.* with a dot in there.

--
Paige Miller

bhca60 · Posted 02-23-2024 03:18 PM

So if i do left joins, I can just do the asterisk without specifying the table. But will that duplicate columns? How would I do a union?

PaigeMiller · Posted 02-24-2024 06:45 AM

@bhca60 wrote:

So if i do left joins, I can just do the asterisk without specifying the table. But will that duplicate columns? How would I do a union?

The question is not ... how to do a UNION? The question is should you be doing a UNION instead of a left join?

So, before we go any further in this discussion, please describe (or better yet show us) an example of the output you want. You tell us you want to do LEFT JOINs but I (and others) don't think that's what you really want. As we move forward, you should be showing us (or describing) the output you want in every thread you post, and it should be in the first post in that thread.

Do you want tables A B C D to be combined horizontally, as shown by @yabwon ? This is impossible without renaming some of the variables. Or do you want tables A B C D to be stacked on top of each other vertically, in which case you will not need to change names at all?

You told us all of the variable names are the same in each data set. Stacking the data sets on top of one another vertically make sense when all the variable names are the same. Combining horizontally with a left join only makes sense if you change the names of the variables in each data set, and even if you were to do the renames, you would end up with a wide data set rather than a long data set, and SAS works much better with long data sets rather than wide. So combining horizontally is not recommended (by me, anyway).

This thread is a good example of the XY problem, where a user thinks they want to do a certain task, and then gets stuck in a specific piece of code and asks only about the code (in this case, left joins). A much better approach is to explain the big picture: explain why you are doing this and what the end goal is (for example the end goal is to perform a certain statistical analysis or create a certain output). Why is this a better approach? Because there may be much better and much easier ways to get to the end goal than the one that you are using which you are having problems with.

--
Paige Miller

PaigeMiller · Posted 02-23-2024 03:04 PM

For the SELECT statement, do I need to do b*, c*, d*? or will b* suffice?

Please explain the problem in a lot more detail.

I am wondering if you want to do a UNION of the data sets and not a left join?

I think you mean b.*, note the dot there

You can use just plain * and this takes all variables from all data sets. If you just use b.* you only get the variables from data set b. Which do you want?

--
Paige Miller

bhca60 · Posted 02-23-2024 03:12 PM

They have the same columns in those data sets - exactly the same columns.

yabwon · Posted 02-23-2024 03:37 PM

lets assume your data looks like this:

libname tko (work);
libname abr (work);

data tko.list;
  do id = 1 to 10;
    a=1;
    b=2;
    output;
  end;
run;

data abr.table1_20;
  do id = 1 to 5;
    a=3;
    b=4;
    output;
  end;
run;

data abr.table1_21;
  do id = 3 to 7;
    a=5;
    b=6;
    output;
  end;
run;

data abr.table1_22;
  do id = 6 to 10;
    a=7;
    b=8;
    output;
  end;
run;

With your request to left join tables with the same structure you will have to not only write a.*,b.*,c.*, d.* but you will have to also rename variables.

Code with only:

PROC SQL;
 CREATE TABLE work.PRE1  AS
    SELECT a.*,b.*,c.*,d.*
    from tko.list as a
    left join abr.table1_20 as b
    on a.id=b.id
    left join  abr.table1_21 as c
    on a.id=c.id
    left join  abr.table1_22 as d
    on a.id=d.id
    ;
quit;

will result with the following log:

1    PROC SQL;
2     CREATE TABLE work.PRE1  AS
3        SELECT a.*,b.*,c.*,d.*
4        from tko.list as a
5        left join abr.table1_20 as b
6        on a.id=b.id
7        left join  abr.table1_21 as c
8        on a.id=c.id
9        left join  abr.table1_22 as d
10       on a.id=d.id
11       ;
WARNING: Variable id already exists on file WORK.PRE1.
WARNING: Variable a already exists on file WORK.PRE1.
WARNING: Variable b already exists on file WORK.PRE1.
WARNING: Variable id already exists on file WORK.PRE1.
WARNING: Variable a already exists on file WORK.PRE1.
WARNING: Variable b already exists on file WORK.PRE1.
WARNING: Variable id already exists on file WORK.PRE1.
WARNING: Variable a already exists on file WORK.PRE1.
WARNING: Variable b already exists on file WORK.PRE1.
NOTE: Table WORK.PRE1 created, with 10 rows and 3 columns.

12   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

If you decide to comment out CREATE TABLE and produce only output:

PROC SQL;
 /*CREATE TABLE work.PRE1  AS*/
    SELECT a.*,b.*,c.*,d.*
    from 
    ...
quit;

you will get something like:

If your table IS NOT TO BIG, me, as a lazy programmer, would use the trick with ODS OUTPUT:

ods trace on; ods output SQL_Results = work.PRE1;
PROC SQL;
 /*CREATE TABLE work.PRE1  AS*/
    SELECT a.*,b.*,c.*,d.*
    from ...
    ;
quit;

It will automatically rename variables names for you:

Output Added:
-------------
Name:       SQL_Results
Label:      Query Results
Template:   Base.SQL
Path:       SQL.SQL_Results
-------------
NOTE: Variable id already exists on file WORK.PRE1, using id2 instead.
NOTE: Variable a already exists on file WORK.PRE1, using a2 instead.
NOTE: Variable b already exists on file WORK.PRE1, using b2 instead.
NOTE: Variable id already exists on file WORK.PRE1, using id3 instead.
NOTE: Variable a already exists on file WORK.PRE1, using a3 instead.
NOTE: Variable b already exists on file WORK.PRE1, using b3 instead.
NOTE: Variable id already exists on file WORK.PRE1, using id4 instead.
NOTE: Variable a already exists on file WORK.PRE1, using a4 instead.
NOTE: Variable b already exists on file WORK.PRE1, using b4 instead.
NOTE: The data set WORK.PRE1 has 10 observations and 12 variables.
14   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.03 seconds

But if the table is BIG, I would avoid ODS trick, and produce full explicit list of variables.

If you are lazy to type them all try use FEEDBACK option:

PROC SQL feedback noexec;
 /*CREATE TABLE work.PRE1  AS*/
    SELECT a.*,b.*,c.*,d.*
    from ...
    ;
quit;

In the log you will see something like this:

NOTE: Statement transforms to:

        select A.id, A.a, A.b, B.id, B.a, B.b, C.id, C.a, C.b, D.id, D.a, D.b
          from TKO.LIST A left outer join ABR.TABLE1_20 B on A.id = B.id left outer join
ABR.TABLE1_21 C on A.id = C.id left outer join ABR.TABLE1_22 D on A.id = D.id;

NOTE: Statement not executed due to NOEXEC option.
31   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

With a bit of creativity and help of Notepad++ you can fast edit the code.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

yabwon · Posted 02-23-2024 03:48 PM

One more fast trick to produce list of variables with changed names:

proc transpose data=tko.list(obs=0) out=work.varList;
var _all_;
run;
data _null_;
  set work.varList(rename=_name_=n);
  put 
  ",a." n " as a_" n /
  ",b." n " as b_" n /
  ",c." n " as c_" n /
  ",d." n " as d_" n /
  ;
run;

Check the log.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

bhca60 · Posted 02-23-2024 04:51 PM

Thank you; so the b , c, and d tables all have the same columns and I want to bring all their columns. "A " table has only two columns and that's id and id_key. I think if I just do b.*, c.*, d.* i should be good. Each table has id, dttm, age columns.

Tom · Posted 02-23-2024 07:38 PM

So what is the purpose of the combination?
Is it to recode the variable ID_KEY into ID (or perhaps the other way around)?
Is it to subset the results to just the set of ID (or ID_KEY) values that appear in the LOOKUP table?

Or some combination of the two?

How big is the dataset that just has ID and ID_KEY? WHy not just convert it to a FORMAT and then your code might be as simple as:

data want;
  set b c d ;
  id = put(id_key, $id_key.);
  drop id_key;
run;

bhca60 · Posted 02-23-2024 08:00 PM

I just want to know after doing left joins , in the Select statement, do i have to do b.*,c.*,d.* ? I assume I do if I want all columns, I just thought it would cause duplication buti dont think so.

Tom · Posted 02-23-2024 10:55 PM

@bhca60 wrote:

I just want to know after doing left joins , in the Select statement, do i have to do b.*,c.*,d.* ? I assume I do if I want all columns, I just thought it would cause duplication buti dont think so.

@yabwon Already answered that. The answer was NO you cannot do it that way. Because a SAS dataset can only have one variable with the same name.

If B and C and D all have a variable named DATE and you tell SQL to take ALL of the variables from ALL of the datasets then the first variable named DATE in the column list implied by the * wildcard(s) will be taken and the ones coming from the other datasets will be ignored/excluded.

The code you posted originally is very strange looking and without any context of what you were actually trying to accomplish everyone was having a hard time understanding what you were asking about.

SASKiwi · Posted 02-26-2024 06:29 PM

I suspect this logic will get you closer to what you want:

PROC SQL;
 CREATE TABLE work.PRE1 AS

 SELECT b.*
 from tko.list as a

left join 
(select *
 from abr.table1_20
 union 
 select * 
 from abr.table1_21
 union
 select *
 from abr.table1_22
) as b
on a.id=b.id
;
quit;

LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Re: LEft joining same tables multiple times

Classroom Training Available!