Desktop productivity for business analysts and programmers

Which join would be best

Reply
Occasional Contributor
Posts: 14

Which join would be best

[ Edited ]

I have 4 different tables with a certain metric in each table, and 3 columns in each table, that I am attempting to join on. 

 

the data is as follows:

 

table 1

Region | Center | Assoc | Metric 1 | Descriptor 1

a            1            1           z            

c            3            3           z

d            4            4           z

 

table 2

Region | Center | Assoc | Metric 2 | Descriptor 2

a            1            1            t

b            2            2            t

c            3            3            t

 

table 3

Region | Center | Assoc | Metric 3 | Descriptor 3

a            1            1            y

b            2            2            y

c            3            3            y

d            4            4            y

 

table 4

Region | Center | Assoc | Metric 4 | Descriptor 4

a             1            1           x

d             4            4           x

 

 

the result that I am looking for would have all regions, centers, and associates listed whether or not they are in 1 or all of the tables.  and all the metrics are filled in, even if its missing.  The important part is that each region, center, and associate that is in any of the 4 tables be shown in the resulting table.

 

here's what I'm looking for:

Region | Center | Assoc | Metric 1 | Descriptor 1 | Metric 2 | Descriptor 2 | Metric 3 | Descriptor 3 | Metric 4 | Descriptor 4

a             1           1           z                                     t                                     y                                    x

b             2           2           .                                      t                                     y                                    .

c             3           3           z                                     t                                     y                                    .

d             4           4           z                                     .                                     y                                    x

 

I have tried a full join as seen below, and

FROM WORK.BC_ACTIVATION_OFFICERS t1
           FULL OUTER JOIN WORK.BC_CLOSEDIN90_OFFICERS t2 ON (t1.'Region No'n = t2.'Region No'n) AND (t1.'Center No'n = t2.'Center No'n) AND (t1.Officer = t2.Officer)
           FULL OUTER JOIN WORK.CLOSED_W_NEW_WITHIN_30D_OFFICERS t3 ON (t1.'Region No'n = t3.'Region No'n) AND (t1.'Center No'n = t3.'Center No'n) AND (t1.Officer = t3.'New Acct Officer'n)
		  FULL OUTER JOIN WORK.BC_APPS_BY_OFFICER t4 ON (t1.'Region No'n = t4.'Region No'n) AND (t1.'Center No'n = t4.'Center No'n) AND (t1.Officer = t4.Officer)

and I get missing regions and centers and officers.  I have checked the tables individually, and there are no missing regions, centers or associates in any of them

 

 

Ideas?

 

Thanks,

El

Super User
Posts: 23,980

Re: Which join would be best

[ Edited ]

Can you show the expected output you would want from that sample input?

 

If all the files have the same variables, usually the data is stacked, not merged, so to understand your desired outcome would help.

 

Spoiler

EIrvin wrote:

I have 4 different tables with a certain metric in each table, and 3 columns in each table, that I am attempting to join on. 

 

the data is as follows:

 

table 1

Region | Center | Assoc | Metric 1 | Descriptor 1

a            1            1           z            

c            3            3           z

d            4            4           z

 

table 2

Region | Center | Assoc | Metric 2 | Descriptor 2

a            1            1            t

b            2            2            t

c            3            3            t

 

table 3

Region | Center | Assoc | Metric 3 | Descriptor 3

a            1            1            y

b            2            2            y

c            3            3            y

d            4            4            y

 

table 4

Region | Center | Assoc | Metric 4 | Descriptor 4

a             1            1           x

d             4            4           x

 

 

the result that I am looking for would have all regions, centers, and associates listed whether or not they are in 1 or all of the tables.  and all the metrics are filled in, even if its missing.  The important part is that each region, center, and associate that is in any of the 4 tables be shown in the resulting table.

 

here's what I'm looking for:

Region | Center | Assoc | Metric 1 | Descriptor 1 | Metric 2 | Descriptor 2 | Metric 3 | Descriptor 3 | Metric 4 | Descriptor 4

a             1           1           z                                     t                                     y                                    x

b             2           2           .                                      t                                     y                                    .

c             3           3           z                                     t                                     y                                    .

d             4           4           z                                     .                                     y                                    x

 

I have tried a full join as seen below, and

FROM WORK.BC_ACTIVATION_OFFICERS t1
           FULL OUTER JOIN WORK.BC_CLOSEDIN90_OFFICERS t2 ON (t1.'Region No'n = t2.'Region No'n) AND (t1.'Center No'n = t2.'Center No'n) AND (t1.Officer = t2.Officer)
           FULL OUTER JOIN WORK.CLOSED_W_NEW_WITHIN_30D_OFFICERS t3 ON (t1.'Region No'n = t3.'Region No'n) AND (t1.'Center No'n = t3.'Center No'n) AND (t1.Officer = t3.'New Acct Officer'n)
		  FULL OUTER JOIN WORK.BC_APPS_BY_OFFICER t4 ON (t1.'Region No'n = t4.'Region No'n) AND (t1.'Center No'n = t4.'Center No'n) AND (t1.Officer = t4.Officer)

and I get missing regions and centers and officers.  I have checked the tables individually, and there are no missing regions, centers or associates in any of them

 

 

Ideas?

 

Thanks,

El


Occasional Contributor
Posts: 14

Re: Which join would be best

I was in the process of editing, just that, when you responded.  It should be out there now

 

Super User
Posts: 23,980

Re: Which join would be best

 and all the metrics are filled in, even if its missing

How does that work when there's no data in the tables?

 

It looks like you're trying to recreate a data from a star schema structure/fact+dim tables. Is that a correct assumption?

Occasional Contributor
Posts: 14

Re: Which join would be best

I'm not sure I know what a star schema is.  but because the metrics were so acutely focused, I had to create tables just for those metrics, however, my audience wants to see them all rolled up together.  But I can't go to a mast list of all associates and centers and regions, because that master list would be too large. so I was attempting to capture those regions, centers, and associates that appeared in any one of those tables, and show all metrics in a single line for each of the region/center/associates ...  if that makes any sense

Super User
Posts: 23,980

Re: Which join would be best

You do have a master table though, with all the information you need? If so, it may be easier to subset from that table than do the re-merges, based on the keys in these tables.
Super User
Posts: 6,921

Re: Which join would be best

[ Edited ]

You don't show which fields you are selecting, but presumably the key fields (Region, Center, Assoc) are selected from just one of the tables.  However ...

 

Remember, a full outer join takes records from both tables.  There can be mismatches.  If you are selecting Region, Center, and Assoc from the first table, it is possible that some Regions appear in the second table only.  Those records get selected by a full outer join, but there is no Region value found in the first table.

 

Are you able to use MERGE instead of PROC SQL?

Occasional Contributor
Posts: 14

Re: Which join would be best

Posted in reply to Astounding

SQL is the only language I know.  how does merge work in these situations?

Super User
Posts: 6,921

Re: Which join would be best

[ Edited ]

Assuming you have four SAS data sets, sort each of them:

 

by region center assoc;

 

Then combine them:

 

data want;

merge table1 table2 table3 table4;

by region center assoc;

run;

 

That will take care of missing values for the three key variables.  Of course there can be missing values for some of the metrics.

 

*********** EDITED:

 

If you don't have four SAS data sets (or if MERGE makes you uncomfortable), you can probably revise your SELECT statement along these lines:

 

select coalesce(t1.region, t2.region, t3.region, t4.region) as region,

 

... same idea for the other two joining variables.

Contributor
Posts: 30

Re: Which join would be best

if i understand the request correct, here is what i would do:

i would create a list of distinct values for Region | Center | Association by something like this 

proc sql noprint;

create table distinctvalues as

SELECT DISTINCT Region, Center, Association

FROM

SELECT Region, Center, Association FROM Table1 UNION ALL

SELECT Region, Center, Association FROM Table2 UNION ALL

SELECT Region, Center, Association FROM Table3 UNION ALL

SELECT Region, Center, Association FROM Table4

) a

quit;

 

i would then left join the 4 tables to the DistinctValues tables

Super User
Posts: 23,980

Re: Which join would be best

Posted in reply to utrocketengineer

Or use the distinct list to filter your master list. 

 

proc sql;
create table want as
select m.* 
from master as m
left join distinctValues as d
on m.region =d.region
and m.center=d.center
and m.association = d.association;
quit;
PROC Star
Posts: 549

Re: Which join would be best

probably @Astounding is the way to go. but in sql you can do this as 

proc sql;
select region, 
     center , 
     assoc, 
     max(metric1) as metric1, 
     max(Descriptor1) as Descriptor1,
     max(metric2) as metric2,
     max(Descriptor2) as Descriptor2,
     max(metric3) as  metric3,
     max(Descriptor3) as Descriptor3,
      max(metric4) as  metric4,
     max(Descriptor1) as Descriptor4
     from 
          (select coalesce(a.Region,b.region,c.region,d.region) as region ,
                   coalesce(a.Center,b.center,c.center,d.center) as center, 
                   coalesce(a.Assoc, b.assoc, c.assoc,d.assoc) as assoc ,
                   Metric1  ,
                   Descriptor1,
                   Metric2  ,
                 Descriptor2,
                 Metric3  ,
                 Descriptor3,
                 Metric4  ,
                Descriptor4
         from table1 a
         full join
        table2 b
       on a.Region=b.region 
       and a.Center=b.center
       and  a.Assoc= b.assoc
      full join 
     table3 c
     on a.Region=c.region 
     and      a.Center=c.center
     and      a.Assoc= c.assoc
   full join 
    table4 d
    on a.Region=d.region 
   and      a.Center=d.center
    and      a.Assoc= d.assoc)a
   group by region, center, assoc

 ;
Ask a Question
Discussion stats
  • 11 replies
  • 206 views
  • 4 likes
  • 5 in conversation