Help using Base SAS procedures

PROC DS2 performance issues using numeric variables

Reply
Highlighted
Occasional Contributor
Posts: 5

PROC DS2 performance issues using numeric variables

Hi all,

I posted the same thread at stack exchange thought might get better help here hopefully.

 

I was trying to use proc ds2 to get some performance increases over the normal data step by using the multithreaded capability.
fred.testdata is a SPDE dataset containing 5 million observations. My code is below:

 

proc ds2;
   thread home_claims_thread / overwrite = yes;
   /*declare char(10) producttype;
   declare char(12) wrknat_clmtype;
   declare char(7) claimtypedet;
   declare char(1) event_flag;*/
   /*declare date week_ending having format date9.;*/
   method run();
      /*declare char(7) _week_ending;*/
      set fred.testdata;
      if claim = 'X' then claimtypedet= 'ABC';
      else if claim = 'Y' then claimtypedet= 'DEF';
      /*_week_ending = COMPRESS(exposmth,'M');
    week_ending = to_date(substr(_week_ending,1,4) || '-' || substr(_week_ending,5,2) || '-01');*/
   end;
   endthread;

data home_claims / overwrite = yes;
   declare thread home_claims_thread t; 
   method run();
      set from t threads=8;
   end;
enddata;
run;
quit;

 

I didn't include all IF statements and only included a few otherwise it would have taken up a few pages (you should get the idea hopefully). As the code currently is it works quite a fair bit faster than the normal data step however significant performance issues arise when any of the following happens:

  1. I uncomment any of the declare statements
  2. I include any numeric variables in fred.testdata (even without performing any calculations on the numeric variables)

My questions are:

  1. Is there any way to introduce numeric variables into fred.testdata without getting significant slowdowns which make DS2 way slower than the normal data step? (for this small table of 5 million rows including numeric column/s the real time is about 1 min 30 for ds2 and 20 seconds for normal data step). The actual full table is closer to 600 million rows. For example I would like to be able to do that week_ending conversion without it introducing a 5x performance penalty in run times.  I've noticed in "nmon" that as soon as I uncomment out the week_ending logic it somehow defaults back to using only 1 thread and as soon as I comment out week_ending it goes back up to using the full 8 threads. Run times for ds2 WITHOUT declare statements and numeric variables takes about 7 seconds
  2. Is there any way to compress the table in ds2 without having to do an additional data step to compress it?

 

Thank you

Ask a Question
Discussion stats
  • 0 replies
  • 291 views
  • 1 like
  • 1 in conversation