BookmarkSubscribeRSS Feed
Augusto
Obsidian | Level 7

Hi people,

What's the best way to improve this SQL code below for having a good performance?

Could be in Proc SQL or Data Step.

Any help will be greatly acceptable;

Each table has about 13.018.525 rows and 250 columns.

proc sql;

    create table preparacao as

    select a.DOSSIE,

           a.TDPRCPRODCOM,

           a.TDPRCPOSA,

           a.TDPRNAGCRCVT,

           a.JAN_CREAT,

           a.LMA,

           a.TDPRMFORIG,

           a.TDPRBFINORIG,

           a.CADEIA,

           a.atraso_atual_old,

           a.TDPRCPAIEMENT,

           a.SALDO_TOTAL,

           a.TDPRCVERROU,

           a.TALPCPSA,

           a.TDPRNQUANT,

           a.TCLINIDENT,

           b.atraso_atual_old as AT_H1,

           (b.SALDO_PAG/b.SALDO_MES) as PCT_PAG_H1,

           b.SALDO_TOTAL as SALDO_H1,

           b.TODU_PFRA as TODU_PFRA_H1,          

           c.atraso_atual_old as AT_H2,

           (c.SALDO_PAG/c.SALDO_MES) as PCT_PAG_H2,

           c.SALDO_TOTAL as SALDO_H2,

           c.TODU_PFRA as TODU_PFRA_H2,

           d.atraso_atual_old as AT_H3,

           (d.SALDO_PAG/d.SALDO_MES) as PCT_PAG_H3,

           d.SALDO_TOTAL as SALDO_H3,

           d.TODU_PFRA as TODU_PFRA_H3,

           sum(b.RECEITA,-c.RECEITA) as RECEITA_H1,

           sum(c.RECEITA,-d.RECEITA) as RECEITA_H2,

           sum(d.RECEITA,-e.RECEITA) as RECEITA_H3,

     i.mtrotmes,

           i.mail,

           i.sms,

           i.tlmkt,

           i.TCLILNOMPREN,

           i.TCLFNDDDP,

           i.TCLFNTELP,

           i.TCLINDDDEMP,

           i.TCLINTELEMP,

           i.TCLINDDDCEL,

           i.TCLINTELCEL,

           i.TVDCHAINE,

           i.tdprnsoc,

           i.talpcposdosa,

           i.tdcpcficp,

           i.d_cre,

           i.TDCPBMONTSSR12M,

           i.D_PCP

    from nbdos.base_portmanagement_201203 as a   left join nbdos.base_portmanagement_201202   as b on a.DOSSIE=b.DOSSIE

                                                 left join nbdos.base_portmanagement_201201   as c on a.DOSSIE=c.DOSSIE

                                                 left join nbdos.base_portmanagement_201112   as d on a.DOSSIE=d.DOSSIE

                                                 left join nbdos.base_portmanagement_201111   as e on a.DOSSIE=e.DOSSIE

                                                 left join baseseg_201204                                 as i on a.DOSSIE=i.DOSSIE

;

    quit;

Thanks.

12 REPLIES 12
Reeza
Super User

Your code looks pretty straight forward to me. I'm assuming you're having performance issues of some sort?

If so my guess would be your last join.

That looks like you're joining to a work table rather than one stored on a library (possibly server) tables and then you'll have performance issues. 

Is that what you're situation?

Augusto
Obsidian | Level 7

Hi Reeza, thanks for your response.

The library nbdos is permanent.

In fact I'm just trying to get this step better.

The problem is, i'm joining 6 tables to get some variables from each one of them by proc sql that I guess is not the best way for this accomplish

and moreover left join makes sort firstly and consume alot of cpu resources, despite merge would need to sort too.

sassharp
Calcite | Level 5

did you try hash joins?

Augusto
Obsidian | Level 7

Hi sassharp,

I don't know if hash would it be the best solution for this because as i mentioned before each table has about 13.018.525 rows.

ChrisNZ
Tourmaline | Level 20

It doesn't matter that there are 13 million rows, you only use a few variables from the look-up tables, so the memory footprint won't be huge (depending on how long the variables are). Use option fullstimer to gauge memory usage.

In my quick test, 13m rows use 500MB of RAM per table, so with 3GB of RAM you can load all 6 hash tables in one go.

At worse, you can do the hash look-ups in successive data steps (especially for table i where you fetch more variables), it will still be way faster that sorting.

PGStats
Opal | Level 21

Adding to Reeza's question. Please tell us for each table :

  1. Is it local or remote?
  2. Is it managed by a server?
  3. Does it have an index defined on DOSSIE?

PG

PG
Augusto
Obsidian | Level 7

Hi PG,

Here are the responses:

1 - It's remote.

2 - It's managed by a server.

3 - Yes, all table have index defined on DOSSIE.

Augusto Souza.


PGStats
Opal | Level 21

Expanding on question 2 : What kind of server? File or database? Could the query be processed at the server level? - PG

PG
Reeza
Super User

Its all on a server you say, but your last table was from a work directory?

How is your performance when you exclude that last table?

If all tables are on the server and your work table is in your work library SAS has to download all the files before you can do anything on it.

PGStats
Opal | Level 21

SAS-SQL might not be able to farm out the 2-parameter MAX functions either...

PG

PG
Reeza
Super User

do you mean sum? perhaps not...but those could be rewritten to be SQL compliant easily enough.

Reeza
Super User

Also set

Option msglevel=i;

And run your query to see if the indexes are being used, it'll indicate this in the log.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1176 views
  • 0 likes
  • 5 in conversation