DATA Step, Macro, Functions and more

How to reduce the time taken in running the following code.

Reply
Contributor
Posts: 36

How to reduce the time taken in running the following code.

 

Hello Friends,

I am running the code written below and it takes almost 16 Hrs to finish the processing. I need your help in optimizing it in order to reduce the processing time.

 

The time taken is mostly beacuse it sorts the dataset many times and everytime it reads and writes the dataset on to the machine hard disk.

I Tried Tagsort option but it didn't reduce the time.

I am unable to attach both the dataset here as it gives option of attaching a single file hence sharing a cloud link for the Data-Set:

 

https://1drv.ms/f/s!AmgOKgExCpSglDb7WLBJUc4afstl

 

Plz. Help me in optimizing the run time. 

 

Code:

 

proc printto log= 'G:\Test\l.txt'/*'C:\Users\RKD\Desktop\AT_Analysis\l.txt'*/ new ; 
run;
proc printto print='G:\Test\o.txt'/*'C:\Users\RKD\Desktop\AT_Analysis\o.txt'*/ new;
run;

data list; 
input names $ 20.;
if index(names,'&') then names=tranwrd(names,'&','and');
if index(names,'-') then names=tranwrd(names,'-','d');
if index(names,'3') then names=tranwrd(names,'3','t'); 
datalines; 
POWERGRID
;

proc sql noprint; 
select count(*) into :cnt from list; 
select distinct(names) into: mlist separated by ':' from list; 
quit; 
%macro omatching; 
%do j=1 %to &cnt %by 1; 
%let compname=%sysfunc(scan(&mlist,&j,':'));
%if &compname=IdFLEX %then %let symbol=%str(I-FLEX);
%else %let symbol=&compname;
%put &compname;

data &compname._trades;
set mac.&compname.trades;
tno=trdnum;
bno=buyordnum;
sno=sellordnum;
format tradetime time8.;
*tt=tradetime-'09:15:00't;
tp=tradeprice;
tq=trdqty;
run;
data &compname._trades;
set &compname._trades;
tt=tradetime-'09:15:00't;
keep tt tp tq bno sno tno;
run;
data &compname._orders;
set mac.&compname.orders;
rename ordnum = orno;
rename oactivity=acttype;
rename otype=bors;
rename VolDisclosed=vd;
rename oqty = vo;
rename ordprice=lp;
rename MLOrdInd=mkt;
rename Otime=otm;
rename ordtime = time;
format time time8.;
if stploss='N';
run;
data &compname._b;
set &compname._orders;
if bors='B';
bt=time-'09:15:00't;
run;
data &compname._s;
set &compname._orders;
if bors='S';
st=time-'09:15:00't;
run;

data &compname._b;
set &compname._b;
bp=lp;
bq=vo;
bdq=vd;
bqo=vo;
btype=acttype;
borno=orno;
btime=time;
if time=. then delete;
keep borno btype bp bq bdq bqo btime bt ;
run;
data &compname._s;
set &compname._s;
if time=. then delete;
sp=lp;
sq=vo;
sdq=vd;
sqo=vo;
stype=acttype;
sorno=orno;
stime=time;
keep sorno stype sp sq sdq sqo stime st ;
run;
%do i=1 %to 22501 %by 60; 
%let tm=&i;
data &compname._bord&i;
set &compname._b;
if &tm-60 < bt <= &tm;
bno=borno;
run;
data &compname._sord&i;
set &compname._s;
if &tm-60 < st <= &tm;
sno=sorno;
run;
proc append base=&compname._bsnap data=&compname._bord&i force;
run;
proc append base=&compname._ssnap data=&compname._sord&i force;
run;
data _null_; 
if 0 then set &compname._bord&i nobs=num1; 
call symput('num1',put(num1,8.-l)); 
stop; 
run; 
data _null_; 
if 0 then set &compname._sord&i nobs=num2; 
call symput('num2',put(num2,8.-l)); 
stop; 
run; 
%macro borders;
proc sort data=&compname._bsnap TAGSORT;
by borno;
run;
data &compname._bsnap;
set &compname._bsnap;
by borno;
if last.borno;
run;
data &compname._bsnap;
set &compname._bsnap;
if btype=3 then delete;
run;
%mend borders;
%macro sorders;
proc sort data=&compname._ssnap TAGSORT;
by sorno;
run;
data &compname._ssnap;
set &compname._ssnap;
by sorno;
if last.sorno;
run;
data &compname._ssnap;
set &compname._ssnap;
if stype=3 then delete;
run;
%mend sorders;

%if &num1 > 0 %then %do; 
%borders;
%end; 
%if &num2 > 0 %then %do; 
%sorders;
%end;
data &compname._trades1;
set &compname._trades;
if &tm-60 < tt <= &tm;
run;
proc sort data=&compname._trades1 TAGSORT;
by bno;
run;
proc means data=&compname._trades1 sum;
var tq;
by bno;
output out=trab sum=stqb;
run;
data trab;
set trab;
drop _TYPE_ _FREQ_;
run;
proc sort data=trab TAGSORT;
by bno;
run;
proc sort data=&compname._trades1 TAGSORT;
by bno;
run;
proc sort data=&compname._bsnap TAGSORT;
by bno;
run;
data &compname._bsnap&i;
merge trab &compname._bsnap;
by bno;
run;
data &compname._bsnap&i;
set &compname._bsnap&i;
*if bno=borno then do;
if stqb=. then stqb=0;
bq=bq-stqb;
* end;
keep borno btype bp bq bdq bqo btime bt bno;
run;
data &compname._bsnap&i;
set &compname._bsnap&i;
if bq <= 0 then delete;
if bt=. then delete;
run;
proc sort data=&compname._bsnap&i TAGSORT;
by descending bp bt;
run;
data &compname._bsnap;
set &compname._bsnap&i;
run;
proc sort data=&compname._trades1 TAGSORT;
by sno;
run;
proc means data=&compname._trades1 sum;
var tq;
by sno;
output out=tras sum=stqs;
run;
data tras;
set tras;
drop _TYPE_ _FREQ_;
run;
proc sort data=tras TAGSORT;
by sno;
run;
proc sort data=&compname._trades1 TAGSORT;
by sno;
run;
proc sort data=&compname._ssnap TAGSORT;
by sno;
run;
data &compname._ssnap&i;
merge tras &compname._ssnap;
by sno;
run;
data &compname._ssnap&i;
set &compname._ssnap&i;
*if sno=sorno then do;
if stqs=. then stqs=0;
sq=sq-stqs;
* end;
keep sorno stype sp sq sdq sqo stime st sno;
run;
data &compname._ssnap&i;
set &compname._ssnap&i;
if sq<=0 then delete;
if st=. then delete;
run;
proc sort data=&compname._ssnap&i TAGSORT;
by sp st;
run;
data &compname._ssnap;
set &compname._ssnap&i;
run;
proc sort data=&compname._bsnap&i noduprecs TAGSORT;
by bno;
run;
proc sort data=&compname._bsnap&i TAGSORT;
by descending bp bt;
run;
proc sort data=&compname._ssnap&i noduprecs TAGSORT;
by sno;
run;
proc sort data=&compname._ssnap&i TAGSORT;
by sp st;
run;
data &compname._snap&i;
merge &compname._bsnap&i &compname._ssnap&i;
run;
data &compname._snap&i;
set &compname._snap&i;
tm1=&i;
date='01Jul2013'd;
format date date9.;
run;
proc append base=&compname._snap data=&compname._snap&i;
run;
data &compname._snap;
set &compname._snap;
tm= tm1+'09:15:00't;
format tm time8.;
run;
proc append base=mac.&compname._snap data=&compname._snap;
run;
proc sort data=mac.&compname._snap TAGSORT;
by date tm1;
run;
proc datasets nodetails nolist;
delete &compname._bord&i;
delete &compname._sord&i;
delete &compname._ssnap&i;
delete &compname._bsnap&i;
delete &compname._snap&i;

quit;
%end;
%end;
%mend omatching;
%omatching;

/* The Above code takes almost 16 Hrs to run */

/* The output of the above code will be used to compute the minute by minute spread of the stock. The spread computation code is shown below */

%macro spread(compname);
proc means data=AT.&compname._snap;
by tm;
var bp sp;
output out=AT.&Compname._bq max(bp)=bestbuy min(sp)=bestsell;
run;
%mend;
%spread(POWERGRID);

 

Super User
Super User
Posts: 7,942

Re: How to reduce the time taken in running the following code.

[ Edited ]
Posted in reply to rkdubey84

Hi,

 

This ia a question and answer forum, not a contract work request system.  We don't have the time to debug your process.  It is up to you to run through your code, step by step, and see what can be speeded up, improved.

I have glanced over the first few blocks, and yes, there is quite a bit that can be re-programmed, and certainly a good dose of code formatting would make it a lot easier to read.  Some tips:

data list; 
  input names $ 20.;
  if index(names,'&') then names=tranwrd(names,'&','and');
  if index(names,'-') then names=tranwrd(names,'-','d');
  if index(names,'3') then names=tranwrd(names,'3','t'); 
call execute(cats('%omatching (name=',names,');')); datalines; POWERGRID ;
run;

Would remove the need to create macro variables, do macro loops and such like.  The macro is called once per observation with the names variable.

data &compname._trades;
  set mac.&compname.trades;
  tno=trdnum;
  bno=buyordnum;
  sno=sellordnum;
  format tradetime time8.;
  tp=tradeprice;
  tq=trdqty;
run;

Why in the above are you setting new variables to be the same as other ones?  This is just duplicating the data making the datasets bigger than necessary.  Use rename=.

 

data &compname._trades;
  set &compname._trades;
  tt=tradetime-'09:15:00't;
  keep tt tp tq bno sno tno;
run;

Why is this step needed at all, the above tt=, and the keep can be put in the previous datastep, therefore saving you a whole datestep.

 

Anywhere, there is where I stop.  I would strongly suggest you start by taking one example, drop all the macro code, and then step through your code 1 at a time just running Base SAS code.  Identify what each datastep/procedure does, if it can be improved, dropped completely etc. then once you have the code as good as you can get it, then apply the macro part again.  This is all very symptomatic of coding without a plan, i.e. planning your code, modelling your data per a plan etc. this all helps to virtually remove the programming aspect.

Contributor
Posts: 36

Re: How to reduce the time taken in running the following code.

Thanks RW9 for taking out valuable time from your busy schedule and trying to help me out.

 

I would like to assure you that what I am doing or requesting is not a part of "Contract Work Request System" kind of thing. 

 

My questions were fairly simple and indicated where the majority of the problem lies: "The time taken is mostly beacuse it sorts the dataset many times and everytime it reads and writes the dataset on to the machine hard disk."

 

I had attached the entire code and dataset for better understanding of kind and helpful guys like you, and I also attached coz it suggested at the bottom of the request form to include the relavant data and code. I am extremely sorry if it appeared otherwise to you. 

 

The Major Problem starts from the following part of the code where it sorts number of times as per the loop.

%do i=1 %to 22501 %by 60;

So, Basically I'm looking out for efficient sorting and data reading and writing suggestions. I tried TAGSORT option it didn't help. And index sorting I couldn't implement in this code.

 

I am using the above code for analysing a part of my PhD Thesis work. I am not a SAS Certified professional and am learning bits and pieces from different sources including support.sas.com and communities.sas.com.

 

Thank You for your patience, time and help.

Good Day. 

 

Super User
Super User
Posts: 7,942

Re: How to reduce the time taken in running the following code.

Posted in reply to rkdubey84

Well, its not just a case of applying an option to a function.  The whole program needs addressing, not to mention the data moddeling which is there.  For instance, the part your raise:

%do i=1 %to 22501 %by 60; 

What is this, why 22501, why 60?  Why loop at all?  It looks like you blocking out hourly data, why?  Why not just create a variable with hour, then do your processing using by group?  That would be as fast as possible and simpler coding:

data want;
  set have;
  hour=time / 60;  /* or whatever the number is to get hour */
run;

proc means data=want;
  by hour;
  var xyz;
run;

So in the above example (not tied to your one), I create a variable called hour, which would be 1, 2, 3 etc. depending on the time.  Then in the means I use this hour variable as a by group - no looping, and using the Base SAS functionality.

 

Note that the above is just one example, it would require going through the whole program, ensure the data is structured in the most efficient manner, there really is a lot that can be stripped out, or optimised - for instance as I was scrolling down I noticed this:

data &compname._bsnap&i;
set &compname._bsnap&i;
if bq <= 0 then delete;
if bt=. then delete;
run;

This could be re-written more efficiently as:

data &compname._bsnap&i.;
  set &compname._bsnap&i. (where=(bg not in (0,.)));
run;

But this also begs the question why this datastep is needed at all, it can just be included in one of the other steps.

 

Simply put, if you want an efficient program, then debugging is a step by step process.  If you just want things to run faster, then the simple answer is use smaller datasets or get a more powerful computer. 

Super User
Posts: 11,343

Re: How to reduce the time taken in running the following code.

Why are there in excess of 88,000 data sets created? Lots of disk access for that.

It really looks like someone does not know about SAS BY processing (and possibly the related First. and Last. processing) and/or how to calculate offsets or the values of variables to hold those by values.

 

Look at the code involving ._bord&i sets:

 

data &compname._bord&i;
set &compname._b;
if &tm-60 < bt <= &tm;
bno=borno;
run;

proc append base=&compname._bsnap data=&compname._bord&i force;
run;

This looks like 1) subset the data from one set, set one variable and then 2) append them all to one set.

 

I am not quite sure why this is better then calculating one minimum value for &tm and the corresponding max and generate ONE set.

 

Also it looks likey you compile macros %borders and %sorders inside that loop 22,000+ times. Better would be to pull those macros out, define with appropriate parameters and compile once.

Frequent Contributor
Posts: 129

Re: How to reduce the time taken in running the following code.

Posted in reply to rkdubey84

Hi,

 

format your code, indent the sections and follow this guidelines for a start.

Guidelines for Coding of SAS® Programs

50 Ways to Make Your SAS® Code Execute More Efficiently

Top Ten SAS ® Performance Tuning Techniques

 

Check how big your datasets are, and try to use a memlib instead of work.

You need to reduce the I/O operations and the amount of data beeing moved around.

Use where instead of if.

 

Then ask again, by attaching your log and the text files if you didn't save at least half the process time.

 

Cheers

________________________

- Cheers -

Super User
Posts: 5,500

Re: How to reduce the time taken in running the following code.

Posted in reply to rkdubey84

There is precious little you can do to optimize this code.  Really, you need to rip it up and rewrite it.  This logic processes the same data sets 375 times instead of once.  You need to figure out what it is supposed to do, then give the task to a SAS programmer who understands how to use a BY statement.  That being done, I would be surprised if the final version took more than 30 minutes to run.

 

You could tweak the existing code slightly by replacing some IF statements with WHERE statements.  But that would only make a minor impact.  If the program runs without TAGSORT, remove it.  TAGSORT takes longer to run, but allows programs to run that would otherwise run out of memory when sorting.

Contributor
Posts: 36

Re: How to reduce the time taken in running the following code.

Posted in reply to rkdubey84

Thank You @RW9 @ballardw@Astounding@Oligolas for your valuable inputs. I'll go ahead and try the suggestions that you have advised and will update you all of the progress.

 

Thank you once again for being so patient and helpful.

Ask a Question
Discussion stats
  • 7 replies
  • 348 views
  • 6 likes
  • 5 in conversation