Re: Code efficiency

LinderVogel · Posted 03-22-2016 02:33 PM

Hello everyone, I would like you to take a look at this code and tell me if you consider it to be an effiecent way to write out the variables and fomulas. Thanks.

DATA M.CHAMPS2;
SET M.CHAMPS;

ARRAY CMP_ALL {*} CHAMPS7HOURS CHAMPS9HOURS CHAMPS10HOURS CHAMPS14HOURSNEW CHAMPS15HOURSNEW CHAMPS16HOURS CHAMPS19HOURS CHAMPS20HOURS CHAMPS21HOURS CHAMPS22HOURS CHAMPS23HOURS CHAMPS24HOURS CHAMPS25HOURS CHAMPS26HOURS CHAMPS27HOURS CHAMPS28HOURS CHAMPS29HOURS CHAMPS30HOURS CHAMPS31HOURS CHAMPS32HOURS CHAMPS33HOURSNEW CHAMPS34HOURS CHAMPS35HOURS CHAMPS36HOURS CHAMPS37HOURS CHAMPS38HOURS CHAMPS39HOURS CHAMPS40HOURS;

ARRAY CMP_ALL_FREQ {*} CHAMPS7TIMES CHAMPS9TIMES CHAMPS10TIMES CHAMPS14TIMESNEW CHAMPS15TIMESNEW CHAMPS16TIMES CHAMPS19TIMES CHAMPS20TIMES CHAMPS21TIMES CHAMPS22TIMES CHAMPS23TIMES CHAMPS24TIMES CHAMPS25TIMES CHAMPS26TIMES CHAMPS27TIMES CHAMPS28TIMES CHAMPS29TIMES CHAMPS30TIMES CHAMPS31TIMES CHAMPS32TIMES CHAMPS33TIMESNEW CHAMPS34TIMES CHAMPS35TIMES CHAMPS36TIMES CHAMPS37TIMES CHAMPS38TIMES CHAMPS39TIMES CHAMPS40TIMES;

DO I=1 TO 28;
SELECT(CMP_ALL(I));
WHEN (.) CMP_ALL(I)=0;
WHEN (1) CMP_ALL(I)=.5;
WHEN (2) CMP_ALL(I)=1.75;
WHEN (3) CMP_ALL(I)=3.75;
WHEN (4) CMP_ALL(I)=5.75;
WHEN (5) CMP_ALL(I)=7.75;
WHEN (6) CMP_ALL(I)=9.75;
OTHERWISE CMP_ALL(I)=9.75;
END;
END;

ARRAY METAB_WEIGHT{28} (4.5 3.0 2.0 6.0 4.0 4.5 3.0 2.5 4.0 2.25 3.0 7.0 6.0 3.5 2.5 2.5 4.0 5.0 3.0 5.0 3.0 2.0 2.0 3.5 4.5 3.0 2.5 5.0);
ARRAY CAL_EXPEN_WK {28};
DO I=1 TO 28;
CAL_EXPEN_WK(I)=CMP_ALL(I)*METAB_WEIGHT(I)*3.5*60*ACTUALWEIGHTKG/200;
END;
CAL_EXPEN_WKTOTAL= SUM(OF CAL_EXPEN_WK1-CAL_EXPEN_WK28);

ARRAY FREQWK {28};
DO I=1 TO 28;
FREQWK(I)=CMP_ALL_FREQ(I);
END;
FREQWKTOTAL=SUM(OF FREQWK1-FREQWK28);

ARRAY CAL_MOD {28};
DO I=1 TO 28;
IF METAB_WEIGHT[I]>=3 THEN CAL_MOD(I)=CAL_EXPEN_WK(I);
ELSE CAL_MOD(I)=0;
END;
CAL_MODTOT= SUM(OF CAL_MOD1-CAL_MOD28);

ARRAY CAL_MODFREQ {28};
DO I=1 TO 28;
IF METAB_WEIGHT[I]>=3 THEN CAL_MODFREQ(I)=FREQWK(I);
ELSE CAL_MODFREQ(I)=0;
END;
CAL_MODFREQTOT= SUM(OF CAL_MODFREQ1-CAL_MODFREQ28);
RUN;

PROC MEANS DATA=M.CHAMPS2 N MEAN STD SKEW;
VAR CAL_EXPEN_WKTOTAL FREQWKTOTAL CAL_MODTOT CAL_MODFREQTOT;
RUN;

ballardw · Posted 03-22-2016 03:24 PM

"Effencieny" could mean a number of things: run time, coding time and maintenance, disk space, other resources perhaps.

The only thing I see that would be obviously inefficient is that you have multiple loop of the Do i = 1 to 28; variety.

If I were worried about run time I might have them all in one loop and then calculate all the summaries after that loop but there isn't likely to be a noticeable difference in run time. Removing the 8 lines of code doesn't make the code noticeably smaller. One loop would be slightly easier to maintain if you add or remove items from the arrays. I might switch to use: Do i = 1 to dim(<any of the arrays of the same size>); instead of the literal 28.

Kurt_Bremser · Posted 03-22-2016 03:33 PM

Or simply replace 28 with a macro variable, so you can change all dimensions/loops in one place.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 03-22-2016 04:05 PM

Here's a line that can be eliminated:

WHEN (6) CMP_ALL(I)=9.75;

The OTHERWISE condition would take care of it. That being said, that's a tiny change. You might not even be able to measure the difference.

Here is a more constructive idea. You can can cut down on the number of loops, and increase program readability at the same time. Put all your array statements up front. Then consider this variation:

DO I=1 TO 28;
FREQWK(I)=CMP_ALL_FREQ(I);

IF METAB_WEIGHT[I]>=3 THEN DO;

CAL_MOD(I)=CAL_EXPEN_WK(I);

CAL_MODFREQ(I)=FREQWK(I);

END;

ELSE DO;

CAL_MOD(I)=0;

CAL_MODFREQ(I)=0;

END;
END;
FREQWKTOTAL=SUM(OF FREQWK1-FREQWK28);
CAL_MODTOT= SUM(OF CAL_MOD1-CAL_MOD28);
CAL_MODFREQTOT= SUM(OF CAL_MODFREQ1-CAL_MODFREQ28);

EDITED: In retrospect, this is essentially the same as ballardw's suggestion ... just spelling out the coding changes.

FreelanceReinh · Posted 03-22-2016 04:49 PM

Hello @LinderVogel,

What strikes me first is that in dataset M.CHAMPS there are two sets of 28 variables each, which could be more easily dealt with if the dataset had a vertical structure ("long" rather than "wide" format).

But let's assume that you can't change M.CHAMPS. So, this dataset contains at least 57 variables (the above 2*28 plus ACTUALWEIGHTKG). With your code M.CHAMPS2 will have (at least) 202 variables, many of which are either exact copies or only slight modifications of other variables. The case of FREQWK1-FREQWK28, which are copies of the elements of array CMP_ALL_FREQ, is most striking. This is what I would call inefficient.

Do you really need all these new variables or are you only aiming at the aggregated variables used in the PROC MEANS step at the end?

If you needed only the latter, you could perform the calculations without creating most of the new arrays (all in one DO loop, as ballardw pointed out). Thus, the data step could shrink from 40 to less than 25 lines of code. Also, arrays METAB_WEIGHT and CAL_EXPEN_WK could be candidates for _temporary_ arrays in this case. (Alternatively, the constants in METAB_WEIGHT could be stored in a separate dataset.) Dataset M.CHAMPS2 would then have only 61 variables (plus variables not shown from M.CHAMPS).

Code efficiency