BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TheNovice
Quartz | Level 8

Hi friends,

I have a table A that I need to join to one of 4 source tables. The weird thing is that the source tables house different months... (same format and structure). So the tables houses months as follows:

Tablemonths available
A1,5,9
B2,6,10
C3,7,11
D4,8,12

 

I have table F like:

IDSTART_dateEND_date
                                                     122020-06-012020-08-30

 

Which i need to left join to one of the below tables where the value appears right before the end_Date 

Table A

IDdatecost
                                                     122020-01-155.00
                                                     122020-05-20                       6.00

 

Table b

 

IDdatecost
                                                     122020-02-155.00
                                                     122020-06-20                       6.00

 

Table c

IDdatecost
                                                     122020-03-155.00
                                                     122020-11-20                       6.00

 

Table D

IDdatecost
                                                     122020-04-155.00
                                                     122020-08-12                       15.00

 

The join should give me because table D has the date and value right before the end_date in Table F

IDSTART_dateEND_DateDateCost
                                                     122020-06-012020-08-302020-08-1215.00

 

A. How to do the join where the query picks the right table? the tables are huge and I read that using the union function may not be ideal even in a pass through. Also the tables will have multiple values in the same month so i need to pick the cost that was recorded right before the end_Date

 

Any suggestions would be incredibly helpful

 

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I pulled this suggestion out (deleted it) after re-reading your original post and the mention about the size of tables a-b-c-d. I thought the union of those tables would create an even greater problem. But it could be that SAS/SQL is cleaver enough to avoid creating a temporary table from the union of tables a-b-c-d.

Do you get decent performance with this approach?

 

here it is, in a more readable format

 

data a;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-01-15  5.00
12 2020-05-20                         6.00
;
data b;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-02-15  5.00
12 2020-06-20                         6.00
;
data c;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-03-15  5.00
12 2020-11-20                         6.00
;
data d;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-04-15  5.00
12 2020-08-12                         15.00
;

data f;
input ID (START_date  END_date) (:yymmdd10.);
format START_date  END_date yymmdd10.;
datalines;
12 2020-06-01  2020-08-30
;

proc sql;
create table z as
select
    f.*,
    x.date format=yymmdd10.,
    x.cost
from 
    f left join
    (   select * from a 
        union all 
        select * from b 
        union all
        select * from c 
        union all
        select * from d 
        ) as x on f.id=x.id and x.date between f.START_date and f.END_date
group by f.id
having x.date=max(x.date);
select * from z;
quit;

image.png

PG

View solution in original post

9 REPLIES 9
PGStats
Opal | Level 21

Please explain further where the date=08-30 and cost=15 come from. They don't appear anywhere in the example data.

PG
TheNovice
Quartz | Level 8
Apologies, i have edited the post and made the correction. Does that make sense?
PGStats
Opal | Level 21

If table f is much smaller than a-b-c-d, it might be better to perform 4 left joins and then keep the max date for each id.

PG
TheNovice
Quartz | Level 8
Smaller yes but table F is still over a million records... right now i am doing 4 separate left joins and then using data step to consolidate all....
TheNovice
Quartz | Level 8

I got an email with your reply and it works thank you

 

proc sql;

create table z as

select   f.*,   x.date format=yymmdd10.,   x.cost

from f left join  

(   select * from a        union all        select * from b        union all       select * from c        union all      select * from d        ) as x on f.id=x.id and x.date

between f.START_date and f.END_dategroup by f.idhaving x.date=max(x.date);

select * from z;quit;

PGStats
Opal | Level 21

I pulled this suggestion out (deleted it) after re-reading your original post and the mention about the size of tables a-b-c-d. I thought the union of those tables would create an even greater problem. But it could be that SAS/SQL is cleaver enough to avoid creating a temporary table from the union of tables a-b-c-d.

Do you get decent performance with this approach?

 

here it is, in a more readable format

 

data a;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-01-15  5.00
12 2020-05-20                         6.00
;
data b;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-02-15  5.00
12 2020-06-20                         6.00
;
data c;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-03-15  5.00
12 2020-11-20                         6.00
;
data d;
input ID  date :yymmdd10.    cost;
datalines;
12 2020-04-15  5.00
12 2020-08-12                         15.00
;

data f;
input ID (START_date  END_date) (:yymmdd10.);
format START_date  END_date yymmdd10.;
datalines;
12 2020-06-01  2020-08-30
;

proc sql;
create table z as
select
    f.*,
    x.date format=yymmdd10.,
    x.cost
from 
    f left join
    (   select * from a 
        union all 
        select * from b 
        union all
        select * from c 
        union all
        select * from d 
        ) as x on f.id=x.id and x.date between f.START_date and f.END_date
group by f.id
having x.date=max(x.date);
select * from z;
quit;

image.png

PG
TheNovice
Quartz | Level 8

Thank you very much. The performance wasn't so bad. Took about 10 minutes. I think it will run longer once the original table gets bigger. Can't think of a way around it but i appreciate your help

PGStats
Opal | Level 21

For your real life task, it might be worth replacing the "*" with only the actual variable names that you really need.

PG
TheNovice
Quartz | Level 8
You are correct. the 10 min run time was with me choosing the correct variables. With * it was closer to 30 mins

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 842 views
  • 1 like
  • 2 in conversation