BookmarkSubscribeRSS Feed
Augusto
Obsidian | Level 7

Hi everyone, i have this code below and I would like to enhance it.

Any suggestion will help me so much. Maybe using hash would be better? If so, how to translate it to.

proc sort data=wk.seguros_carona&A4M2.  ; by OCTROI ; run;

proc sort data=pfat.refcto3m&A4M2m1.      ; by OCTROI ; run;

proc sort data=wk.pf_pfra_&A4M2.            ; by OCTROI ; run;

proc sort data=mvt_pfra                      ; by OCTROI ; run;

data wk.pf_filtros&corte._cart&A4M2.;

merge           wk.pf_&corte._cart&A4M2.          (in=a)

          wk.seguros_carona&A4M2.           (in=_seguro )

          pfat.refcto3m&A4M2m1.          (in=_rfcto3 )

          pfat.refctovd&A4M2m1.          (in=_refinvd drop=TDPRNFOY TDPRNCONT)

          wk.pf_pfra_&A4M2.                    (in=_pfra )

          mvt_pfra                              (in=_mvt );

by OCTROI ;

if a;

retain publico;

          if not _refinvd          and intck('month',d_cre,&dtbdos) ge 12

          and tvdchaine ne &submarino          then publico=0;

          else if not _refinvd                    then publico=1;

          else if _rfcto3                    then publico=3;

          else                               publico=2;

           if tdprnsoc ne (100)                                                                                then _filtro=1;

          else if (talpcposdosa in ("560" "570" "CG1" "CG2" "RET" "DCD" "CTX" "USU")

           or tdprnagcrcvt in (560 570))                                                                      then _filtro=3;

          else if cposa ne 'ENC'                                                                       then _filtro=4;

          else if verrou in (45,55,52) or verrou ge 60                                                            then _filtro=5;

          else if tdcpcficp in (5)                                                                      then _filtro=6;

          else if 0<=todu <100                                                                                                then _filtro=7;

          else if todu <7500                                                                                then _filtro=8;

          else if tvdchaine not in &submarino and (tdcpbmontssr12m gt 2 or

(tdcpbmontssr12m eq 1 and intck('month',d_dms,&dtbdos) lt 3) or

(tdcpbmontssr12m eq 2 and intck('month',d_dms,&dtbdos) lt 6) )                                         then _filtro=9;

else if tvdchaine eq &submarino and (d_dms ne . and

intck('month',d_dms,&dtbdos) lt 3)                                                                       then _filtro=9.1;

          else if modo_pagamento eq 'P'                                                             then _filtro=10;

          else if DEGRESSIVIDADE ne 997 and (DEGRESSIVIDADE<901 or DEGRESSIVIDADE>950)          then _filtro=11;

          else if tvdchaine eq 9100280                                                                      then _filtro=13;

          else if intck('month',d_pcp,&dtbdos) lt 1 and

          tvdchaine not in &submarino                                                                      then _filtro=14;

          else if _seguro                                                                                then _filtro=15;

          else if atpfra ge 1          or _mvt                                                                      then _filtro=16;

              else if tvdchaine ne 9106782 and GRENT = "Nao Abordados"                                        then _filtro=17;

          else if tvdchaine eq 9106782 and publico in (0,1) and NO_SUBA in (201)                               then _filtro=18;

          else if tvdchaine eq 9106782 and publico in (2) and NO_SUBA in (103)                               then _filtro=18;

          else if tvdchaine eq 9106782 and publico in (3) and NO_SUBA in (103,146,201)                     then _filtro=18;

          else                                                                                           _filtro=99;

run;

Thanks so much.

10 REPLIES 10
Doc_Duke
Rhodochrosite | Level 12

A couple of thoughts.

1) publico does not need to be RETAINed, as it is assigned for each pass through the DATA step.

2) the second IF-THEN-ELSE  sequence might have it's performance by rearranging it to have the highest frequency values of _filtro come first.

Doc Muhlbaier

Duke

Augusto
Obsidian | Level 7

I've these questions?

Would you recommend to make merge with many data sets for this case?

Could it be better using proc sql, considering that I would not need to sort the data sets?

Thanks for the answers.

Doc_Duke
Rhodochrosite | Level 12

You could replace the MERGE with a PROC SQL.  The SORT is done in the backround by SQL, so it still gets done.

You would be hard pressed to do the second IF-THEN-ELSE  sequence in SQL.  You could, but it would take a bit of work.  If you don't, it's another pass through the data.

Coming up with 'best performace' has so many varaibles to consider that  we are unlikely to hit it just as an abstract process.

TomKari
Onyx | Level 15

What you have created is a very reasonable solution to the problem. Merging data can always be troublesome.

I don't see anything in it that is inherently slow.

  How many records are in each of your input files?

  Which steps do you feel are taking too long?

A couple of suggestions, but I doubt they'll help much.

  If there are a lot of variables in your input datasets that you won't need, remove them coming into the sorts, keeping just the variables you need.

  You might see some slight improvements from your data step by reordering the if statements so that the options that occur most frequently are tested for first.

I believe that you'd only see a benefit from using hash tables if your sort is too slow. Usually SAS sorts data very quickly.

Another option is to replace the sort/merge process with a PROC SQL join, but as the underlying process will be very similar, I don't see major improvements. Maybe someone else has experience with comparing sort/merge with join.

Tom

Augusto
Obsidian | Level 7

The data sets wk.pf_&corte._cart&A4M2 and mvt_pfra and wk.pf_pfra_&A4M2. are very large. the other are small

Doc_Duke
Rhodochrosite | Level 12

One other performance tool that may be applicable is the use of FORMATs for subsetting.  If the small datasets just have a few variables that are used in the subsetting, then you may be able to convert some of the variables into formats for the subsetting.

Augusto
Obsidian | Level 7

Thanks Doc@Duke and Tom. I'm considering your suggestions. What about if I create one data set with merge/join taking all variable needed and make the conditional later? Maybe using FORMAT.

Could it be a better solution?

Thanks

Augusto Souza

ChrisNZ
Tourmaline | Level 20

Try to use hash tables and use option fullstimer to watch memory usage.

When you have speed issues, only discard the hash table option when you have tested and are sure you don't have enough memory. In this case, one large table and few small ones might well fit in memory.

More RAM might be a cheap and effective improvement in your case if you are not limited by the OS. Another tiny hardware change that will make a huge difference on things like sorts is to use solid state storage for the work area. Highly recommended.

shivas
Pyrite | Level 9

Hi,

Is Proc Sort is taking lot of time ...you can avoid that by creating index on OCTROI variable and you avoid sort steps completely.

Thanks,

Shiva

ChrisNZ
Tourmaline | Level 20

Shivas, have you ever tried to merge large unsorted tables by index? It is way slower than sorting in my experience.

If memory doesn't allow hash tables, I reckon formats is the next thing to try.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1324 views
  • 6 likes
  • 5 in conversation