Hi people,
What's the best way to improve this SQL code below for having a good performance?
Could be in Proc SQL or Data Step.
Any help will be greatly acceptable;
Each table has about 13.018.525 rows and 250 columns.
proc sql;
create table preparacao as
select a.DOSSIE,
a.TDPRCPRODCOM,
a.TDPRCPOSA,
a.TDPRNAGCRCVT,
a.JAN_CREAT,
a.LMA,
a.TDPRMFORIG,
a.TDPRBFINORIG,
a.CADEIA,
a.atraso_atual_old,
a.TDPRCPAIEMENT,
a.SALDO_TOTAL,
a.TDPRCVERROU,
a.TALPCPSA,
a.TDPRNQUANT,
a.TCLINIDENT,
b.atraso_atual_old as AT_H1,
(b.SALDO_PAG/b.SALDO_MES) as PCT_PAG_H1,
b.SALDO_TOTAL as SALDO_H1,
b.TODU_PFRA as TODU_PFRA_H1,
c.atraso_atual_old as AT_H2,
(c.SALDO_PAG/c.SALDO_MES) as PCT_PAG_H2,
c.SALDO_TOTAL as SALDO_H2,
c.TODU_PFRA as TODU_PFRA_H2,
d.atraso_atual_old as AT_H3,
(d.SALDO_PAG/d.SALDO_MES) as PCT_PAG_H3,
d.SALDO_TOTAL as SALDO_H3,
d.TODU_PFRA as TODU_PFRA_H3,
sum(b.RECEITA,-c.RECEITA) as RECEITA_H1,
sum(c.RECEITA,-d.RECEITA) as RECEITA_H2,
sum(d.RECEITA,-e.RECEITA) as RECEITA_H3,
i.mtrotmes,
i.mail,
i.sms,
i.tlmkt,
i.TCLILNOMPREN,
i.TCLFNDDDP,
i.TCLFNTELP,
i.TCLINDDDEMP,
i.TCLINTELEMP,
i.TCLINDDDCEL,
i.TCLINTELCEL,
i.TVDCHAINE,
i.tdprnsoc,
i.talpcposdosa,
i.tdcpcficp,
i.d_cre,
i.TDCPBMONTSSR12M,
i.D_PCP
from nbdos.base_portmanagement_201203 as a left join nbdos.base_portmanagement_201202 as b on a.DOSSIE=b.DOSSIE
left join nbdos.base_portmanagement_201201 as c on a.DOSSIE=c.DOSSIE
left join nbdos.base_portmanagement_201112 as d on a.DOSSIE=d.DOSSIE
left join nbdos.base_portmanagement_201111 as e on a.DOSSIE=e.DOSSIE
left join baseseg_201204 as i on a.DOSSIE=i.DOSSIE
;
quit;
Thanks.
Your code looks pretty straight forward to me. I'm assuming you're having performance issues of some sort?
If so my guess would be your last join.
That looks like you're joining to a work table rather than one stored on a library (possibly server) tables and then you'll have performance issues.
Is that what you're situation?
Hi Reeza, thanks for your response.
The library nbdos is permanent.
In fact I'm just trying to get this step better.
The problem is, i'm joining 6 tables to get some variables from each one of them by proc sql that I guess is not the best way for this accomplish
and moreover left join makes sort firstly and consume alot of cpu resources, despite merge would need to sort too.
did you try hash joins?
Hi sassharp,
I don't know if hash would it be the best solution for this because as i mentioned before each table has about 13.018.525 rows.
It doesn't matter that there are 13 million rows, you only use a few variables from the look-up tables, so the memory footprint won't be huge (depending on how long the variables are). Use option fullstimer to gauge memory usage.
In my quick test, 13m rows use 500MB of RAM per table, so with 3GB of RAM you can load all 6 hash tables in one go.
At worse, you can do the hash look-ups in successive data steps (especially for table i where you fetch more variables), it will still be way faster that sorting.
Adding to Reeza's question. Please tell us for each table :
PG
Hi PG,
Here are the responses:
1 - It's remote.
2 - It's managed by a server.
3 - Yes, all table have index defined on DOSSIE.
Augusto Souza.
Expanding on question 2 : What kind of server? File or database? Could the query be processed at the server level? - PG
Its all on a server you say, but your last table was from a work directory?
How is your performance when you exclude that last table?
If all tables are on the server and your work table is in your work library SAS has to download all the files before you can do anything on it.
SAS-SQL might not be able to farm out the 2-parameter MAX functions either...
PG
do you mean sum? perhaps not...but those could be rewritten to be SQL compliant easily enough.
Also set
Option msglevel=i;
And run your query to see if the indexes are being used, it'll indicate this in the log.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.