05-17-2012 03:15 PM
What's the best way to improve this SQL code below for having a good performance?
Could be in Proc SQL or Data Step.
Any help will be greatly acceptable;
Each table has about 13.018.525 rows and 250 columns.
create table preparacao as
b.atraso_atual_old as AT_H1,
(b.SALDO_PAG/b.SALDO_MES) as PCT_PAG_H1,
b.SALDO_TOTAL as SALDO_H1,
b.TODU_PFRA as TODU_PFRA_H1,
c.atraso_atual_old as AT_H2,
(c.SALDO_PAG/c.SALDO_MES) as PCT_PAG_H2,
c.SALDO_TOTAL as SALDO_H2,
c.TODU_PFRA as TODU_PFRA_H2,
d.atraso_atual_old as AT_H3,
(d.SALDO_PAG/d.SALDO_MES) as PCT_PAG_H3,
d.SALDO_TOTAL as SALDO_H3,
d.TODU_PFRA as TODU_PFRA_H3,
sum(b.RECEITA,-c.RECEITA) as RECEITA_H1,
sum(c.RECEITA,-d.RECEITA) as RECEITA_H2,
sum(d.RECEITA,-e.RECEITA) as RECEITA_H3,
from nbdos.base_portmanagement_201203 as a left join nbdos.base_portmanagement_201202 as b on a.DOSSIE=b.DOSSIE
left join nbdos.base_portmanagement_201201 as c on a.DOSSIE=c.DOSSIE
left join nbdos.base_portmanagement_201112 as d on a.DOSSIE=d.DOSSIE
left join nbdos.base_portmanagement_201111 as e on a.DOSSIE=e.DOSSIE
left join baseseg_201204 as i on a.DOSSIE=i.DOSSIE
05-17-2012 04:39 PM
Your code looks pretty straight forward to me. I'm assuming you're having performance issues of some sort?
If so my guess would be your last join.
That looks like you're joining to a work table rather than one stored on a library (possibly server) tables and then you'll have performance issues.
Is that what you're situation?
05-17-2012 05:01 PM
Hi Reeza, thanks for your response.
The library nbdos is permanent.
In fact I'm just trying to get this step better.
The problem is, i'm joining 6 tables to get some variables from each one of them by proc sql that I guess is not the best way for this accomplish
and moreover left join makes sort firstly and consume alot of cpu resources, despite merge would need to sort too.
05-17-2012 06:46 PM
It doesn't matter that there are 13 million rows, you only use a few variables from the look-up tables, so the memory footprint won't be huge (depending on how long the variables are). Use option fullstimer to gauge memory usage.
In my quick test, 13m rows use 500MB of RAM per table, so with 3GB of RAM you can load all 6 hash tables in one go.
At worse, you can do the hash look-ups in successive data steps (especially for table i where you fetch more variables), it will still be way faster that sorting.
05-17-2012 06:01 PM
Its all on a server you say, but your last table was from a work directory?
How is your performance when you exclude that last table?
If all tables are on the server and your work table is in your work library SAS has to download all the files before you can do anything on it.