BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TheNovice
Quartz | Level 8

Hi friends,

I have a table A that I need to join to one of 4 source tables. The weird thing is that the source tables house different months... (same format and structure). So the tables houses months as follows:

Tablemonths available
A1,5,9
B2,6,10
C3,7,11
D4,8,12

 

I have table F like:

IDSTART_dateEND_date
                                                     122020-06-012020-08-30

 

Which i need to left join to one of the below tables where the value appears right before the end_Date 

Table A

IDdatecost
                                                     122020-01-155.00
                                                     122020-05-20                       6.00

 

Table b

 

IDdatecost
                                                     122020-02-155.00
                                                     122020-06-20                       6.00

 

Table c

IDdatecost
                                                     122020-03-155.00
                                                     122020-11-20                       6.00

 

Table D

IDdatecost
                                                     122020-04-155.00
                                                     122020-08-12                       15.00

 

The join should give me because table D has the date and value right before the end_date in Table F

IDSTART_dateEND_DateDateCost
                                                     122020-06-012020-08-302020-08-1215.00

 

A. How to do the join where the query picks the right table? the tables are huge and I read that using the union function may not be ideal even in a pass through. Also the tables will have multiple values in the same month so i need to pick the cost that was recorded right before the end_Date

 

Any suggestions would be incredibly helpful

 

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I pulled this suggestion out (deleted it) after re-reading your original post and the mention about the size of tables a-b-c-d. I thought the union of those tables would create an even greater problem. But it could be that SAS/SQL is cleaver enough to avoid creating a temporary table from the union of tables a-b-c-d.

Do you get decent performance with this approach?

 

here it is, in a more readable format

 

data a;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-01-15  5.00
12 2020-05-20                         6.00
;
data b;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-02-15  5.00
12 2020-06-20                         6.00
;
data c;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-03-15  5.00
12 2020-11-20                         6.00
;
data d;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-04-15  5.00
12 2020-08-12                         15.00
;

data f;
input ID (START_date  END_date) (:yymmdd10.);
format START_date  END_date yymmdd10.;
datalines;
12 2020-06-01  2020-08-30
;

proc sql;
create table z as
select
    f.*,
    x.date format=yymmdd10.,
    x.cost
from 
    f left join
    (   select * from a 
        union all 
        select * from b 
        union all
        select * from c 
        union all
        select * from d 
        ) as x on f.id=x.id and x.date between f.START_date and f.END_date
group by f.id
having x.date=max(x.date);
select * from z;
quit;

image.png

PG

View solution in original post

9 REPLIES 9
PGStats
Opal | Level 21

Please explain further where the date=08-30 and cost=15 come from. They don't appear anywhere in the example data.

PG
TheNovice
Quartz | Level 8
Apologies, i have edited the post and made the correction. Does that make sense?
PGStats
Opal | Level 21

If table f is much smaller than a-b-c-d, it might be better to perform 4 left joins and then keep the max date for each id.

PG
TheNovice
Quartz | Level 8
Smaller yes but table F is still over a million records... right now i am doing 4 separate left joins and then using data step to consolidate all....
TheNovice
Quartz | Level 8

I got an email with your reply and it works thank you

 

proc sql;

create table z as

select   f.*,   x.date format=yymmdd10.,   x.cost

from f left join  

(   select * from a        union all        select * from b        union all       select * from c        union all      select * from d        ) as x on f.id=x.id and x.date

between f.START_date and f.END_dategroup by f.idhaving x.date=max(x.date);

select * from z;quit;

PGStats
Opal | Level 21

I pulled this suggestion out (deleted it) after re-reading your original post and the mention about the size of tables a-b-c-d. I thought the union of those tables would create an even greater problem. But it could be that SAS/SQL is cleaver enough to avoid creating a temporary table from the union of tables a-b-c-d.

Do you get decent performance with this approach?

 

here it is, in a more readable format

 

data a;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-01-15  5.00
12 2020-05-20                         6.00
;
data b;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-02-15  5.00
12 2020-06-20                         6.00
;
data c;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-03-15  5.00
12 2020-11-20                         6.00
;
data d;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-04-15  5.00
12 2020-08-12                         15.00
;

data f;
input ID (START_date  END_date) (:yymmdd10.);
format START_date  END_date yymmdd10.;
datalines;
12 2020-06-01  2020-08-30
;

proc sql;
create table z as
select
    f.*,
    x.date format=yymmdd10.,
    x.cost
from 
    f left join
    (   select * from a 
        union all 
        select * from b 
        union all
        select * from c 
        union all
        select * from d 
        ) as x on f.id=x.id and x.date between f.START_date and f.END_date
group by f.id
having x.date=max(x.date);
select * from z;
quit;

image.png

PG
TheNovice
Quartz | Level 8

Thank you very much. The performance wasn't so bad. Took about 10 minutes. I think it will run longer once the original table gets bigger. Can't think of a way around it but i appreciate your help

PGStats
Opal | Level 21

For your real life task, it might be worth replacing the "*" with only the actual variable names that you really need.

PG
TheNovice
Quartz | Level 8
You are correct. the 10 min run time was with me choosing the correct variables. With * it was closer to 30 mins
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1986 views
  • 1 like
  • 2 in conversation