Re: Inefficient code

deleted_user · Posted 02-02-2011 10:14 AM

Hi there
I've written an optimisation macro (comb) which I need to loop through 168000 times. Unsurprisingly it's taking a while - I'm just wondering if anyone has tips for making the code more efficient.

%let att1=12;
%let att2=5;
%let att3=5;
%let att4=4;
%let att5=4;
%let att6=7;
%let att7=5;

data comb;
do x1=1 to &att1;
do x2=1 to &att2;
do x3=1 to &att3;
do x4=1 to &att4;
do x5=1 to &att5;
do x6=1 to &att6;
do x7=1 to &att7;
output;
end;
end;
end;
end;
end;
end;
end;
run;

data comb;
set comb;
num=_n_;
run;

%macro comb(num);
data opt&num;
set comb;
*1st Attribute;
n1=1;
array a a1-a&att1;
do over a;
if x1=n1 then a=1;else a=0;
n1+1;
end;
*2nd Attribute;
n2=1;
array b b1-b&att2;
do over b;
if x2=n2 then b=1;else b=0;
n2+1;
end;
*3rd Attribute;
n3=1;
array c c1-c&att3;
do over c;
if x3=n3 then c=1;else c=0;
n3+1;
end;
*4th Attribute;
n4=1;
array d d1-d&att4;
do over d;
if x4=n4 then d=1;else d=0;
n4+1;
end;
*5th Attribute;
n5=1;
array e e1-e&att5;
do over e;
if x5=n5 then e=1;else e=0;
n5+1;
end;
*6th Attribute;
n6=1;
array f f1-f&att6;
do over f;
if x6=n6 then f=1;else f=0;
n6+1;
end;
*7th Attribute;
n7=1;
array g g1-g&att7;
do over g;
if x7=n7 then g=1;else g=0;
n7+1;
end;
*Add a dummy for merging;
dummy=1;
where num=&num;
drop n1 n2 n3 n4 n5 n6 n7;
run;
*Merge onto utility data;
data util_&num;
merge util opt&num;
by dummy;
tot1=0;
tot2=0;
tot3=0;
tot4=0;
tot5=0;
tot6=0;
tot7=0;
*1st Attribute;
array a att1_lev1-att1_lev&att1;
array b a1-a&att1;
do over a;
if a*b~=0 then tot1+a*b;
end;
*2nd Attribute;
array c att2_lev1-att2_lev&att2;
array d b1-b&att2;
do over c;
if c*d~=0 then tot2+c*d;
end;
*3rd Attribute;
array e att3_lev1-att3_lev&att3;
array f c1-c&att3;
do over e;
if e*f~=0 then tot3+e*f;
end;
*4th Attribute;
array g att4_lev1-att4_lev&att4;
array h d1-d&att4;
do over g;
if g*h~=0 then tot4=g*h;
end;
*5th Attribute;
array i att5_lev1-att5_lev&att5;
array j e1-e&att5;
do over i;
if i*j~=0 then tot5=i*j;
end;
*6th Attribute;
array k att6_lev1-att6_lev&att6;
array l f1-f&att6;
do over k;
if k*l~=0 then tot6=k*l;
end;
*7th Attribute;
array m att7_lev1-att7_lev&att7;
array n g1-g&att7;
do over m;
if m*n~=0 then tot7=m*n;
end;
total=sum (of tot1 tot2 tot3 tot4 tot5 tot6 tot7);
if total>none then pick=1;else pick=0;
run;
proc univariate data=util_&num noprint;
var pick;
output out=pick&num mean=mean;
*weight weight;
run;
data pick&num;
set pick&num;
dummy=1;
run;
data pick&num;
merge pick&num opt&num;
by dummy;
run;
%mend comb;

data util;
set util;
dummy=1;
run;

dm 'log;clear;autoscroll 0';

%macro dummy;
%do x=1 %to 168000;
%comb(&x);
%end;
%mend;

%dummy;

Any tips would be greatly appreciated!

data_null__ · Posted 02-02-2011 12:38 PM

I think your program can be made more efficient. But I would like to understand the big picture a bit. Please explain the process.

Your program is too macro loopy. You are doing 5 or so steps for each observation in COMP 168000 I would not expect that to ever finish. So "we" need to figure out how to do more work in fewer steps. Processing all obs in COMB in each step.

It looks like the first step in the MACRO that creates 168000 data sets OPT&num is creating dummy variables similar to a design matrix. If that is the case this code will do that work for all obs in one step.

Let me know if this is going in the right direction and explain the overall process.

[pre]
data comb;
do x1=1 to &att1;
do x2=1 to &att2;
do x3=1 to &att3;
do x4=1 to &att4;
do x5=1 to &att5;
do x6=1 to &att6;
do x7=1 to &att7;
num + 1;
output;
end;
end;
end;
end;
end;
end;
end;
run;
proc glmmod data=comb noprint outdesign=opt;
class x1-x7;
model num=x1-x7 / noint;
run;
proc contents data=opt varnum;
run;
[/pre]

deleted_user · Posted 02-02-2011 12:51 PM

Thanks - I will try out this code.
You're right that the opt&num datasets are alternatives from a design matrix. The program is designed to take conjoint utilities and calculate the overall utility of a certain set of attributes. There are seven attributes - with number of levels defined by att1-att7. Overall there are 168000 possible combinations of attribute levels.
The util dataset contains the utilities for 2000 respondents - so what I'm doing (or trying to do!) is merge on each design - calculate if an individual would pick that configuration and then calculate an average across everyone.
So the final output is an appeal score for each of the 168000 combinations.
Doing it all in one step is certainly something I'll be looking into.

data_null__ · Posted 02-02-2011 01:22 PM

Please show example of UTIL data or e-mail it to me at IEBUPDTE@GMAIL.COM.

deleted_user · Posted 02-03-2011 04:36 AM

This is what's in the UTIL dataset

respid - identifier for each respondent
att1lev1-att1lev12
att2lev1-att2lev5
att3lev1-att3lev5
att4lev1-att4lev4
att5lev1-att5lev4
att6lev1-att6lev7
att7lev1-att7lev5
none - the utility score associated with none.
The attXlevY variables are real numbers - summing to 0 across each attribute.
Please let me know if you need more information.

data_null__ · Posted 02-03-2011 07:37 AM

Can you post a few records?

deleted_user · Posted 02-03-2011 08:12 AM

Sure - here you go:-

respid att1_lev1 att1_lev2 att1_lev3 att1_lev4 att1_lev5 att1_lev6 att1_lev7 att1_lev8 att1_lev9 att1_lev10 att1_lev11 att1_lev12 att2_lev1 att2_lev2 att2_lev3 att2_lev4 att2_lev5 att3_lev1 att3_lev2 att3_lev3 att3_lev4 att3_lev5 att4_lev1 att4_lev2 att4_lev3 att4_lev4 att5_lev1 att5_lev2 att5_lev3 att5_lev4 att6_lev1 att6_lev2 att6_lev3 att6_lev4 att6_lev5 att6_lev6 att6_lev7 att7_lev1 att7_lev2 att7_lev3 att7_lev4 att7_lev5 none
1 1.755 -0.358 -1.561 -0.114 -1.135 0.564 1.21 -1.237 -1.431 1.247 0.37 0.691 -0.828 -0.32 0.007 0.742 0.4 -2.432 0.762 0.798 0.872 -0.641 0.931 -0.348 0.805 -0.747 2.162 1.729 0.323 -4.214 1.228 0.35 -0.363 1.649 -0.314 -0.005 -2.544 1.411 0.291 0.739 -2.076 -0.365 4.765
2 -0.836 -0.354 -0.428 1.501 -1.386 0.051 -0.23 0.968 0.857 -0.248 0.439 -0.333 0.328 0.172 1.37 0.152 -2.022 -2.435 -0.406 1.675 1.167 -0.558 0.837 -0.325 -0.05 0.095 0.88 1.325 1.384 -3.589 3.569 0.71 -0.034 -0.043 0.891 -1.405 -3.688 1.277 0.149 -0.539 -0.944 0.057 6.249
3 -1.116 0.309 -0.456 -0.004 0.199 -0.384 -0.386 0.032 0.291 -0.252 0.961 0.805 -0.491 -0.277 0.63 0.582 -0.444 -1.911 0.244 1.013 0.653 -0.885 -0.811 0.208 0.701 0.787 6.332 1.262 -1.957 -5.637 1.2 1.034 0.586 -0.523 -0.098 -0.626 -1.571 -0.581 0.9 0.066 -1.022 0.637 7.085
4 0.325 -0.934 0.627 -0.913 0.98 0.125 -1.125 0.404 -0.535 0.918 1.103 -0.975 -1.859 0.35 1.71 -0.048 -0.153 -3.26 -0.091 0.595 2.756 0.156 -0.56 0.489 -0.002 -0.082 5.518 1.427 -1.683 -5.263 0.375 0.157 0.182 -0.468 0.682 -0.243 -0.684 0.211 0.553 0.614 -1.577 0.197 5.809

Not easy to see here - but should copy okay into Excel or something.
Thanks

deleted_user · Posted 02-02-2011 01:02 PM

Tried the code and it's definitely the way forward! Thanks a lot 🙂

sbb · Posted 02-02-2011 12:39 PM

To start, review your code for unnecessary DATA steps, for starters the NUM assignment can be coded in the initial DATA step.

And are you certain the code works, as you have posted - what is this stmt doing (several statements are missing trailing semi-colons as well):

where num=#

Recommend setting FULLSTIMER option and interrogate each step, either PROC or DATA type. Isolate steps that may be consolidated into prior steps. Investigate using SAS views, where applicable.

Scott Barry
SBBWorks, Inc.

deleted_user · Posted 02-02-2011 12:55 PM

The where num=&num clause is making sure I identify each specific design. But looking at the other reply I'm wondering if instead I can do one big step on all respondents/ all designs at once.
And - yes - it works fine on more manageable designs with less combinations to test out. It's just the size of this specific problem that's killing it.

data_null__ · Posted 02-03-2011 02:11 PM

I think I understand most of what you are doing and have a program that only takes a few seconds to run. It is still not the most efficient way as I think it can all be done in one data step. But I haven't figured that out.

1) create scoring coeficents for each LEVel of each ATTribute. I used TRANSREG instead of GLMMOD because it names the new variables in a way that makes it easy to use with PROC SCORE.
2) For each of the 186,000 sets of coeficients compute the TOTAL from the UTIL data. PROC SCORE.
3) compare TOTAL to NONE and make a note of the ones that are larger than NUM.
4) not sure what to do next.

[pre]
%let att1=12;
%let att2=5;
%let att3=5;
%let att4=4;
%let att5=4;
%let att6=7;
%let att7=5;
data comb;
do att1_lev=1 to &att1;
do att2_lev=1 to &att2;
do att3_lev=1 to &att3;
do att4_lev=1 to &att4;
do att5_lev=1 to &att5;
do att6_lev=1 to &att6;
do att7_lev=1 to &att7;
num + 1;
length id $8;
id = cats('s',put(num,z6.));
output;
end;
end;
end;
end;
end;
end;
end;
run;
proc transreg data=comb design;
model class(att: / zero=none);
id id;
output out=opt(drop=intercept);
run;
proc contents data=opt varnum;
run;
proc print data=opt(obs=10);
run;

data util;
infile cards;
input
respid
att1_lev1-att1_lev12
att2_lev1-att2_lev5
att3_lev1-att3_lev5
att4_lev1-att4_lev4
att5_lev1-att5_lev4
att6_lev1-att6_lev7
att7_lev1-att7_lev5
none;
cards;
1 1.755 -0.358 -1.561 -0.114 -1.135 0.564 1.21 -1.237 -1.431 1.247 0.37 0.691
-0.828 -0.32 0.007 0.742 0.4 -2.432 0.762 0.798 0.872 -0.641 0.931 -0.348
0.805 -0.747 2.162 1.729 0.323 -4.214 1.228 0.35 -0.363 1.649 -0.314 -0.005
-2.544 1.411 0.291 0.739 -2.076 -0.365 4.765
2 -0.836 -0.354 -0.428 1.501 -1.386 0.051 -0.23 0.968 0.857 -0.248 0.439
-0.333 0.328 0.172 1.37 0.152 -2.022 -2.435 -0.406 1.675 1.167 -0.558 0.837
-0.325 -0.05 0.095 0.88 1.325 1.384 -3.589 3.569 0.71 -0.034 -0.043 0.891
-1.405 -3.688 1.277 0.149 -0.539 -0.944 0.057 6.249
3 -1.116 0.309 -0.456 -0.004 0.199 -0.384 -0.386 0.032 0.291 -0.252 0.961
0.805 -0.491 -0.277 0.63 0.582 -0.444 -1.911 0.244 1.013 0.653 -0.885 -0.811
0.208 0.701 0.787 6.332 1.262 -1.957 -5.637 1.2 1.034 0.586 -0.523 -0.098
-0.626 -1.571 -0.581 0.9 0.066 -1.022 0.637 7.085
4 0.325 -0.934 0.627 -0.913 0.98 0.125 -1.125 0.404 -0.535 0.918 1.103 -0.975
-1.859 0.35 1.71 -0.048 -0.153 -3.26 -0.091 0.595 2.756 0.156 -0.56 0.489
-0.002 -0.082 5.518 1.427 -1.683 -5.263 0.375 0.157 0.182 -0.468 0.682
-0.243 -0.684 0.211 0.553 0.614 -1.577 0.197 5.809
;;;;
run;
proc print;
run;

proc score data=util score=opt out=utilScr;
id respid none;
var
att1_lev1-att1_lev12
att2_lev1-att2_lev5
att3_lev1-att3_lev5
att4_lev1-att4_lev4
att5_lev1-att5_lev4
att6_lev1-att6_lev7
att7_lev1-att7_lev5
;
run;
proc print data=utilscr(drop=s000011-s168000);
* this data set has LOTS of variables so be careful when printing it.;
run;

* create a data set of scores that are larger than NONE.;
data pick;
set utilScr;
array s

s:;
drop s:;
do pickid = 1 to dim(s);
if s[pickid] gt none then do;
total = s[pickid];
output;
end;
end;
run;
proc print data=pick(obs=100);
run;
[/pre]

deleted_user · Posted 02-04-2011 06:41 AM

This is exactly what I need - thanks a lot for that.
All I need to do at the end is calculate how many idividuals have picked each item which is really straightforward from the PICK dataset.
Genius!

Classroom Training Available!