BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aminkarimid
Lapis Lazuli | Level 10

Hello everybody;
I have chosen SAS for technical analysis which I have used for writing my thesis.
I wrote codes which have been shown below. I want to rewrite them to make them simple and well-structured such as a semi-professional programmer. However, I have not enough knowledge about programming.
Here are my codes:

***********************************
*STEP 1: ROUNDING TIME;
***********************************
;
data Sampledata87_RT;
set Sampledata87;
TRD_EVENT_TIME = INPUT(TRD_EVENT_TM,time16.);
TRD_EVENT_ROUNDED = ROUND(TRD_EVENT_TIME,'00:30't);
TRD_EVENT_ROUFOR = PUT(TRD_EVENT_ROUNDED,hhmm.);

***********************************
*STEP 2: CALCULATING INTRADAY VOLUME;
***********************************
;
CountedVOLUME = TRD_PR*TRD_TUROVR;

***********************************
*STEP 3: CALCULATING NORMALIZED VOLUME;
***********************************
;

*Denominator
/*Sort by TRD_STCK_CD and temporal variables.*/;
proc sort data=Sampledata87_RT out=Sampledata87_SumVol;
    by  TRD_EVENT_DT;
run;

/*Sum VOLUME until the last of each TRD_STCK_CD is reached.*/
data Sampledata87_SumVolSo;
    set Sampledata87_SumVol;
    by  TRD_EVENT_DT
		TRD_STCK_CD notsorted;
	format TRD_STCK_CD  $5.;
	informat TRD_STCK_CD  $5.;
    retain tmp_volume_sum;
    tmp_volume_sum + CountedVOLUME;
    if last.TRD_STCK_CD then do;
        DailyVolume = tmp_volume_sum;
        call missing(tmp_volume_sum);
    end;
    drop tmp_:;
run;

*The numerator
/*Sum VOLUME until the last of each TRD_STCK_CD is reached.*/;
data Sampledata87_SumVolSo;
    set Sampledata87_SumVolSo;
    by  TRD_EVENT_DT
		TRD_STCK_CD
		TRD_EVENT_ROUFOR notsorted;
    retain tmp_intradayvolume_sum;
    tmp_intradayvolume_sum + CountedVOLUME;
    if last.TRD_EVENT_ROUFOR then do;
        IntradayVolume = tmp_intradayvolume_sum;
        call missing(tmp_intradayvolume_sum);
    end;
    drop tmp_:;
run;

* Another way for calculating daily volume based on data set;
/*
proc sql noprint;
	create table sums as
	select TRD_STCK_CD, TRD_EVENT_DT, sum(CountedVOLUME) as volume_sum
	from Sampledata87_SumVolSo
	group by TRD_STCK_CD, TRD_EVENT_DT;

	create index TRD_STCK_CD on sums;
quit;

data Sampledata87_SumVolSo02;
	set Sampledata87_SumVolSo;
	by  TRD_EVENT_DT
		TRD_STCK_CD notsorted;
	volume_sum = .;
	if last.TRD_STCK_CD then
    set sums key=TRD_STCK_CD;
run;
*/;

*Approach 1: Calculating Daily Volume by Data set;

*Division for calculating adjusted volume in approach 1;
proc sort data=sampledata87_sumvolso out=sampledata87_sumvolso;
by TRD_STCK_CD TRD_EVENT_DT;
run;
 
data sampledata87_adjvol;
     do until(last.TRD_STCK_CD);
           do until(last.TRD_EVENT_DT);
                set sampledata87_sumvolso;
                by TRD_STCK_CD TRD_EVENT_DT;
 
                if first.TRD_STCK_CD then
                     n=0;
 
                if first.TRD_EVENT_DT then
                     n+1;
 
                if n>1 then
                     do;
                           if not missing(IntradayVolume) then
                                adjusted_volume=divide(IntradayVolume,temp);
                                else call missing(adjusted_volume);
                     end;
 
                if last.TRD_EVENT_DT then
                     temp=dailyvolume;
                output;
           end;
     end;
     drop temp n;
run;
 
proc sort data = sampledata87_adjvol;
by TRD_EVENT_DT TRD_STCK_CD;
run;


*Approach 2: Calculating daily volume by merging tables;

*Changing name & format of table 2 for coordination;
data sampledata87_02;
	set sampledata87_02;
	Options VALIDVARNAME=ANY;
	rename
	instrument = TRD_STCK_CD
	Trade_Date = TRD_EVENT_DT;
run;


*Merging tables;
proc sort data=Sampledata87_sumvolso; by TRD_EVENT_DT TRD_STCK_CD; run;
proc sort data=Sampledata87_02; by TRD_EVENT_DT TRD_STCK_CD; run;

data Sampledata87_02_Mer;
	merge Sampledata87_sumvolso Sampledata87_02;
	by TRD_EVENT_DT TRD_STCK_CD;
	keep TRD_EVENT_DT TRD_EVENT_TM TRD_STCK_CD TRD_EVENT_ROUNDED TRD_EVENT_ROUFOR CountedVOLUME Volume IntradayVolume;
run;

*Division for calculating normalized volume in approach 2;

proc sort data=Sampledata87_02_Mer out=Sampledata87_02_Mer;
by TRD_STCK_CD TRD_EVENT_DT;
run;
 
data Sampledata87_02_Mer;
     do until(last.TRD_STCK_CD);
           do until(last.TRD_EVENT_DT);
                set Sampledata87_02_Mer;
                by TRD_STCK_CD TRD_EVENT_DT;
 
                if first.TRD_STCK_CD then
                     n=0;
 
                if first.TRD_EVENT_DT then
                     n+1;
 
                if n>1 then
                     do;
                           if not missing(IntradayVolume) then
                                adjusted_volume=divide(IntradayVolume,temp);
                                else call missing(adjusted_volume);
                     end;
 
                if last.TRD_EVENT_DT then
                     temp=volume;
                output;
           end;
     end;
     drop temp n;
run;
 
proc sort data = Sampledata87_02_Mer;
by TRD_EVENT_DT TRD_STCK_CD;
run;


***********************************
STEP 4: REGRESSING DUMMY VARIABLES ON NORMALIZED VOLUME VARAIBLE USING AUTOMATICLLY GENERATING DUMMY VARIABLE METHOD
***********************************
;
* Regression with dummy variables in approach 1;
* Regressing dummy variables on normalized volume variable using calculated volume;
proc genmod data=Sampledata87_adjvol;
   class TRD_EVENT_ROUFOR / param=effect;
   model adjusted_volume = TRD_EVENT_ROUFOR / noscale;
   ods select ParameterEstimates;
run;


* Regression with dummy variables in approach 2;
* Regressing dummy variables on normalized volume variable using merged table; 

proc genmod data=Sampledata87_02_mer;
   class TRD_EVENT_ROUFOR / param=effect;
   model adjusted_volume = TRD_EVENT_ROUFOR / noscale;
   ods select ParameterEstimates;
run;

 Please help me to think this out.

Thank you in advance for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

In terms of effeciency you need to avoid doing things twice.  For example you copy a dataset just to rename a couple of variables and then you later sort it.  You could either use PROC DATASETS to modify the original dataset to avoid having the read and write the data to rename the variables.  Or you could just add the RENAME= dataset option to the input to your PROC SORT.

 

Also avoid re-sorting dataset.  Sorting can take a really long time, especially for large datasets.

For example you sort and merge by TRD_STCK_CD TRD_EVENT_DT and then later resort by TRD_EVENT_DT TRD_STCK_CD. If you can process both times in the same order then you could avoid having to resort the data.

 

In general if your program runs then you could turn on the FULLSTIMER option, run the code, and then look for the steps that take the longest time and concentrate on improving those first.  Not much sense it working too hard to speed up something that only take a second.

View solution in original post

8 REPLIES 8
art297
Opal | Level 21

Not sure what kind of help you are asking for. Is your code documented as well as would be expected from a professional programmer? Yes!

 

Does it do what you want? Only you can answer that question!

 

Are there some things you could correct/simplify? There almost always is .. even with production code from professional programmers! Some things are probably under the topic of coding preferences (that is .. things that don't change the was a program runs, but which some of us expect to see in code). For example,

(1) you don't always end a data step with a run; statement. I always like to see such boundaries when reviewing code

(2) while you use the implied sum statement (e.g., 

tmp_volume_sum + CountedVOLUME;

you include a retain tmp_volume_sum statement. It's not needed as the form you used automatically retains the variable.

(3) you have a couple of data steps where you don't take advantage of SAS's normal method of processing. e.g.:

data sampledata87_adjvol;
     do until(last.TRD_STCK_CD);
           do until(last.TRD_EVENT_DT);
                set sampledata87_sumvolso;
by TRD_STCK_CS TRD_EVENT_DT;

Without seeing your data and testing whether your approach does anything differently, my guess is that something like the following does the same thing:

data sampledata87_adjvol;
  set sampledata87_sumvolso;
by TRD_STCK_CS TRD_EVENT_DT;

Art, CEO, AnalystFinder.com

 

 

aminkarimid
Lapis Lazuli | Level 10
Thanks art297;
My codes are correct and brings me all that I want. However, I have a bad feeling about them. because I work with big data and time is important for me. So I think my codes should be much more efficient.
How can I do that?
Reeza
Super User

How big and how long are your processes currently taking?

Which parts are inefficient?

aminkarimid
Lapis Lazuli | Level 10
Thanks Reeza;
I don't know which parts are inefficient.
Please tell me tips to rewrite my codes, such as combination or omitting the codes.
Reeza
Super User

@aminkarimid wrote:
Thanks Reeza;
I don't know which parts are inefficient.
Please tell me tips to rewrite my codes, such as combination or omitting the codes.

I'm going to strongly agree with @ballardw here. It's better to fully understand your code and what it does, and how to change it, rather than it to be efficient. Since you're new to SAS and analytics, I would suggest making sure you understand what every single line of your code does. It seems like overkill but commenting each line is a good exercise. Usually when you do this, you naturally see where steps are redundant because you're tracing the process. The other thing that's important is documentation. Especially if you did any data manipulation outside of SAS. 

art297
Opal | Level 21

You likely could reduce your use of proc sort. However, without seeing your data, one can't be sure. But, for one, the sort before you run the proc genmods at the end of your code, doesn't seem to be needed.

 

Art, CEO, AnalystFinder.com

 

Tom
Super User Tom
Super User

In terms of effeciency you need to avoid doing things twice.  For example you copy a dataset just to rename a couple of variables and then you later sort it.  You could either use PROC DATASETS to modify the original dataset to avoid having the read and write the data to rename the variables.  Or you could just add the RENAME= dataset option to the input to your PROC SORT.

 

Also avoid re-sorting dataset.  Sorting can take a really long time, especially for large datasets.

For example you sort and merge by TRD_STCK_CD TRD_EVENT_DT and then later resort by TRD_EVENT_DT TRD_STCK_CD. If you can process both times in the same order then you could avoid having to resort the data.

 

In general if your program runs then you could turn on the FULLSTIMER option, run the code, and then look for the steps that take the longest time and concentrate on improving those first.  Not much sense it working too hard to speed up something that only take a second.

ballardw
Super User

My $0.02

 

"Efficiency" is a slippery beast. You may need to define which behaviors between 1) run time; 2) disk space, network bandwidth or other constraint; 3) code writing and 4) code maintenance are more important.

 

Some code that is very efficient for run time may require lots of disk space or be somewhat difficult to understand (requiring much more time to maintain or make changes)

 

Simple code may take more time to run but is easier to maintain.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1224 views
  • 6 likes
  • 5 in conversation