BookmarkSubscribeRSS Feed
Dozo92i
Fluorite | Level 6

Hello everyone, 

I did a proc sql left join on many tables using a loop but it took too many time to perform since I got pretty big tables, so I wondered if there is a better way to do it. I got 12 tables with same variable ID and other numbered variables :

Table 1 has ID, A_1,B_1,C_1 as variables

Table 2 has ID, A_2, B_2, C_2 as variables ...

Table i has ID, A_i ,B_i, C_i as variables for i from 3 to 12.

 

I would like to have a single table which is the result of a left join of all those tables :  ID,A_1,B_1,C_1, A_2,B_2,C_2....A_12,B_12_C12 . Down below is an example of a macro i use for it.

 

%macro join();

 

%do j= 2 %to 12 ;

proc sql;
create table table1 as select
a.*,
b.*
from table1 as a left join table&j as b
on a.id=b.id ;
quit;

 

%end;

 

%mend;

 

Thiss gives me a result but it took too long if I have huge tables( around 20 millions of rows).

 

Thanks in advance.

11 REPLIES 11
Dozo92i
Fluorite | Level 6

Let's suppose that the ID identifies a unique customer and :

- A_i    represents the amount the customer should pay back for a car loan,

- B_ i   represents the amount the customer should pay back for a house loan

- C_ i  represents  the amount the customer should pay back for leisure loan 

 

" i "  is the number of the month (i=1 corresponds to January data, i=2 corresponds to February data and so on ). It means " table_i " are monthly historical tables that I want to gather into a unique dataset for a modeling purpose. 

Thanks.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Then the data should be modeled in a way that allow simple variable naming, yet powerful storage:

ID  LOAN_TYPE  TIME_POINT   VALUE

1    Car                 dd-mmm-yyyy  xxx.xx

...

 

With the above structure you can have one dataset, with simple metadata - makes programming far simpler - minimum storage requirements (transposed data often has large quantities of blank data cells), and if need be at any point can be transposed up into the way you have it now.  There are no downsides to normalised data, only pluses.  Note also I put date in rather than just month thus avoiding nasty year issues, and allowing use of all the date functions.

Dozo92i
Fluorite | Level 6

Actually I have no power to modify the way those datas are modeled. Your modeling is smart but I think it would lead to very huge tables if for example we stored 10 or even  30 years of mothly historical datas. I wish I have the datas modeled the way you mentioned for my current task... 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Posting test data and required output sure helps get a point across.  From what you post I don't see why you do not use Base SAS merge, as this will make it far simpler for you:

data want;
  merge table:;
  by id;
run;

Note, I have not tested this, so not sure if the : actually works in this context, you may have to put table1 table2 etc.  Still be simpler coding than using sql.

Dozo92i
Fluorite | Level 6

Thanks for taking the time to answer.

 

Actually the problem is a bit difficult to explain that's why I use a simplified example.

I'll give a try to your suggestion to see if it improves the processing time.

 

Let's consider 7 tables (It could be more, like 12 as i mentioned in my post) 

 

DATA table1;
INPUT id A1 B1 C1;
DATALINES;
1 91 11 212
2 92 12 222
3 93 13 233
4 94 14 242
RUN;


DATA table2;
INPUT id A2 B2 C2;
DATALINES;
1 81 11 112
2 82 12 122
3 83 13 133
4 84 14 142
RUN;

DATA table3;
INPUT id A3 B3 C3;
DATALINES;
1 71 10 111
2 72 11 121
3 73 12 131
4 74 13 141
RUN;

 

DATA table4;
INPUT id A4 B4 C4;
DATALINES;
1 61 9 110
2 62 8 120
3 63 11 130
4 64 12 140
RUN;

DATA table5;
INPUT id A5 B5 C5;
DATALINES;
1 60 8 109
2 61 7 119
3 62 10 129
4 63 11 139
RUN;

DATA table6;
INPUT id A6 B6 C6;
DATALINES;
1 59 7 108
2 60 6 118
3 61 9 128
4 62 10 138
RUN;

DATA table7;
INPUT id A7 B7 C7;
DATALINES;
1 50 6 102
2 55 7 110
3 56 8 111
4 53 9 120
RUN;


%macro join();

%do j= 2 %to 7 ;

proc sql;
create table table1 as select
a.*,
b.*
from table1 as a left join table&j as b
on a.id=b.id ;
quit;

%end;

%mend;

%join()

 

The result should look like 

 

id   A1   B1   C1   A2   B2   C2   A3   B3   C3   A4   B4   C4   A5   B5   C5   A6   B6   C6   A7   B7   C7
1   91   11   212   81   11   112   71   10   111   61   9   110   60   8    109   59   7   108   50    6    102
2   92   12   222   82   12   122   72   11   121  62   8   120   61   7    119   60   6   118   55    7    110
3   93   13   233   83   13   133   73   12   131  63   11  130   62  10   129  61   9   128   56   8     111
4   94   14   242   84   14   142   74   13   141  64   12  140   63  11   139  62   10  138   53   9    120

 

Kurt_Bremser
Super User

Given that your example data show a 1:1 relationship all the way through, a simple data step merge will do it:

data want;
merge
  have1
  have2
  have3
  have4
  have5
  have6
  have7
;
by id;
run;

Once that works, you can build a list of dataset names in a data step, use call symput() to store it in a macro variable, and use that in the merge statement.

ballardw
Super User

And using a list such as @RW9 will work with merge.

 

data want;

   merge table: ;

   by id;

run;

 

so if your sets are nicely named, or even groups of "nice" names lists will work.

The "time" would come from large sets. It does take a certain amount of time to push lots of electrons around but I believe the overhead for a sing data step and merge is going to be less than multiple calls to Proc Sql.

Dozo92i
Fluorite | Level 6

Hi everyone, thanks for all your answers, 

@RW9 @Kurt_Bremser @ballardw I used "merge" to join all the tables but "merge" requires sorting first all those tables and that takes too much time since the tables are pretty big.  A trick I used to Left join all my tables is to proceed as followed : 

- I first do for all the tables a left join on the id of the first table  ( Actually, I wanted to left join all  the tables with respect to the table1's id ). I overwrite " table_j " for j from 2 to 12 :

 

%macro test();

 

%do j= 2 %to 12 ;

 

 proc sql;

    create table table&j as

    select * from table1(keep=id) as a left join table&j as b on a.id=b.id;

 quit; 

 

%end;

 

%mend;

 

- secondly I just use a data step with many sets :

 

data final_table;
set table1;
set table2;
set table3;
set table4;
set table5;
set table6;
set table7;
set table8;
set table9;
set table10;
set table11;
set table12;
run;

 

This works fine and reduce processing time but the problem is that the data step doesn't ensure to match the rigth id from one table number "j"  to another. 

 

Kurt_Bremser
Super User

To think that a sort is not needed with SQL is a mistake. SQL just does the sort on its own, and if table1 is not sorted already, SQL will sort it 12 times, and tables 2 to 12 each once, so you get 23 sorts.

Sorting all tables and doing a data step merge will need only 12 sorts.

 

Dozo92i
Fluorite | Level 6

Glad to know that.

Thanks !

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 5918 views
  • 7 likes
  • 4 in conversation