Good evening,
I have a dataset that contains the predicted points for each player in a fantasy football competition. The fields in the dataset are:
Player
Team
Position (G, D, M or A)
Points
Price
Is it possible to use a procedure that would enable me to pick the combination of players that would score maximum points while satisfying following conditions:
Number of G selected = 1
Number of D selected = 4
Number of M selected = 4
Number of A selected = 2
Total price of of players selected <= £75,000,000
Many thanks for any help you can give,
cxkev
Hi cxkev,
I have a blog post, that is exactly about this problem. Unfortunately it is in Hungarian. Formulas are in English.
Kell egy csapat! – a feladat megoldása - Analitika anyanyelven
But you can download sample data and code (OPTMODEL) from here:
http://blogs.sas.com/content/analitika/files/2012/09/generate_data_sas.txt
http://blogs.sas.com/content/analitika/files/2012/09/solve_team_sas.txt
I'm sure there is a mathematical formula for this (that I am not aware of so take this with a grain of salt), but you could break it into a logic problem and avoid any 'formulas' all together.
By definition, if you had no price cap, you would simply pick the 1 G, 4 D, 4 M, and 2 A that scored the most amount of points. however you have a price cap.
So what I would do is create a new variable that is the players Points / Cost. Then I would select the Best players in every group (defined by points scored, regardless of price), then figure out how much you are OVER in your spending budget.
Then rank your players by your "points / cost" variable. Replace the player with the worst points / cost from your seiection with the next available player, and see if you are then under-budget.
Continue this process in a loop and once you are under budget you are "very close" to maximized in your points scored. There's a little more work from here however.
Lets say you are only 1 million over budget, and you replace your 3rd best D with the 5th best D, however their salaries are very far apart. Maybe now you are 8 million under-budget... Well you might be able to get a better "G" player, so you'd have to loop over all available players to see if the difference in salaries are small enough for you to add them back.
Again once this logic check fails you are maximized on points scored.
You can use PROC OPTMODEL to formulate the problem and solve it with the mixed integer linear programming (MILP) solver. You need to introduce one binary variable per player, with the interpretation that the variable equals 1 if and only if that player is selected. Each of the four conditions corresponds to a linear constraint over these variables. And the objective function to be maximized is also a linear function of these variables.
Hi cxkev,
I have a blog post, that is exactly about this problem. Unfortunately it is in Hungarian. Formulas are in English.
Kell egy csapat! – a feladat megoldása - Analitika anyanyelven
But you can download sample data and code (OPTMODEL) from here:
http://blogs.sas.com/content/analitika/files/2012/09/generate_data_sas.txt
http://blogs.sas.com/content/analitika/files/2012/09/solve_team_sas.txt
You also don't need SAS/OR , just a simple data step is enough.
Xia Keshan
How exactly would you solve this with a simple data step?
Sure . The condition is set Total price of of players selected <= £75,000 .
data have; input Player Position $ Points Price ; cards; 1 G 2 8000 2 D 4 7000 3 M 6 12000 4 A 8 10000 5 G 4 9000 6 A 9 14000 8 D 4 8000 9 M 6 4000 10 A 8 10000 12 A 9 14000 13 D 4 7000 14 M 6 12000 17 A 9 14000 18 G 2 8000 19 D 4 7000 20 M 6 12000 24 D 4 7000 25 M 6 12000 29 D 4 7000 30 M 6 7000 32 M 6 1000 36 D 4 7000 37 M 6 12000 42 D 4 7000 43 M 6 12000 45 G 4 9000 46 A 9 14000 47 D 4 7000 48 M 6 12000 55 D 4 7000 56 M 6 12000 60 D 4 6000 61 M 6 9000 63 M 6 2000 67 D 4 5000 68 M 6 9000 70 M 8 6000 ; run; %let dsid=%sysfunc(open(have)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); proc sort data=have ;by position;run; data _null_; set have; by position; if last.position then call symputx(position,_n_); run; data _null_; set have end=last; length list $ 100 point_sum point_max cost _cost 8 ; array pla{&nobs} _temporary_ ; array pos{&nobs} $ _temporary_ ; array poi{&nobs} _temporary_ ; array pri{&nobs} _temporary_ ; pla{_n_}=Player; pos{_n_}=Position; poi{_n_}=Points; pri{_n_}=Price; if last then do; do i1=1 to &A ; do i2=i1+1 to &A ; do j1=%eval(&A+1) to &D ; do j2=j1+1 to &D ; do j3=j2+1 to &D ; do j4=j3+1 to &D ; do m=%eval(&D+1) to &G ; do n1=%eval(&G+1) to &M ; do n2=n1+1 to &M ; do n3=n2+1 to &M ; do n4=n3+1 to &M ; point_sum=sum(poi{i1},poi{i2},poi{j1},poi{j2},poi{j3},poi{j4},poi{m},poi{n1},poi{n2},poi{n3},poi{n4}); cost=sum(pri{i1},pri{i2},pri{j1},pri{j2},pri{j3},pri{j4},pri{m},pri{n1},pri{n2},pri{n3},pri{n4}); if point_sum gt point_max and cost le 75000 then do; point_max=point_sum; _cost=cost; list=catx('|',pla{i1},pla{i2},pla{j1},pla{j2},pla{j3},pla{j4},pla{m},pla{n1},pla{n2},pla{n3},pla{n4}); end; end; end; end; end; end; end; end; end; end; end; end; putlog 'Players : ' list 'Max Points : ' point_max 'Cost : ' _cost; end; run;
Players : 6|12|2|13|60|67|5|9|32|63|70 Max Points : 64 Cost : 75000
Xia Keshan
Message was edited by: xia keshan
, This is hardly can be called a 'simple data step', maybe only for you though . nonetheless, you have the job done nicely. However, you probably need additional step to obtain all of the qualified combinations, or you sure can go extra miles within single data step by setting up Hash or multi-dim array to retain all of the qualified combinations, and in your mock data there are 432 combinations meeting the same criteria.
Here is my dumber approach,
proc sql noprint;
select player into :gp separated by ' ' from have where Position='G' ;
select count(player) into :gc separated by ' ' from have where Position='G' ;
select player into :dp separated by ' ' from have where Position='D' ;
select count(player) into :dc separated by ' ' from have where Position='D' ;
select player into :mp separated by ' ' from have where Position='M' ;
select count(player) into :mc separated by ' ' from have where Position='M' ;
select player into :ap separated by ' ' from have where Position='A' ;
select count(player) into :ac separated by ' ' from have where Position='A' ;
run;
data _gp;
array gp[&gc.] (&gp.);
n=dim(gp);
k=1;
ncomb=comb(n, k);
do j=1 to ncomb;
call allcomb(j, k, of gp
output;
end;
keep gp1;
run;
data _dp;
array dp[&dc.] (&dp.);
n=dim(dp);
k=4;
ncomb=comb(n, k);
do j=1 to ncomb;
call allcomb(j, k, of dp
output;
end;
keep dp1-dp4;
run;
data _mp;
array mp[&mc.] (&mp.);
n=dim(mp);
k=4;
ncomb=comb(n, k);
do j=1 to ncomb;
call allcomb(j, k, of mp
output;
end;
keep mp1-mp4;
run;
data _ap;
array ap[&ac.] (&ap.);
n=dim(ap);
k=2;
ncomb=comb(n, k);
do j=1 to ncomb;
call allcomb(j, k, of ap
output;
end;
keep ap1-ap2;
run;
proc sql;
create table comall as
select * from _gp, _dp, _mp, _ap;
quit;
data h1;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have');
h.definekey('player');
h.definedata(all:'y');
h.definedone();
end;
set comall;
array p gp1--ap2;
call missing (_price, _points);
do over p;
rc=h.find(key:p);
_price+price;
_points+points;
end;
if _price <=75000;
run;
proc sql;
create table want(keep = gp1 dp1-dp4 mp1-mp4 ap1 ap2) as
select * from h1 having _points=max(_points);
quit;
Haikuo
Both data step / sql approaches are using brute force enumeration and hard coded sizes for the player positions (number of players needed). What happens if the number of players needed would change? What if you tried to scale this up? How long does the data step take to run if you had a field of 500 players - which is quite common in Fantasy sports applications?
For this tiny example, on my machine,
data rules;
input position $ numPlayersPerPos;
datalines;
G 1
D 4
M 4
A 2
;
run;
proc optmodel;
set PLAYERS;
num price{PLAYERS};
num points{PLAYERS};
str position{PLAYERS};
set<str> POSITIONS;
num numPlayersPerPos{POSITIONS};
var assign{PLAYERS} binary;
max sumPoints=sum{p in PLAYERS}points
*assign
;
con Budget: sum{p in PLAYERS}price
*assign
<=75000;
con NumPos{po in POSITIONS}:sum{p in PLAYERS:position
=po}assign
=numPlayersPerPos[po];
read data have into PLAYERS=[Player] position price points;
read data rules into POSITIONS=[position] numPlayersPerPos;
solve;
create data team from [player=p]={p in PLAYERS:assign
>0.9};
quit;
Point taken and there is no surprise that specialized SAS Proc will win . But what if you don't have SAS/OR? SAS/OR is not something like SAS/STAT which most of the customers would want to have for just being on the safe side. For those only need this kind of functionality sparsely, data step or proc sql will still stand as a viable solution.
Everyone I know wants to have SAS/OR.
True. I want it, but sadly we don't have it, even as a company with more than 400 SAS users.
(Disclaimer: I write models for SAS/OR customers.)
I'd venture to say that for any sufficiently complex business, the cost of not having SAS/OR is greater than the cost of the license, and that the difference between those two costs only increases as advances in predictive analytics and data management increase analytical maturity.
Beyond computing time, think of analyst time. That is what is most expensive. Compare the complexity of the solutions on this page. If the opportunity to automate a decision process is ignored because implementing a solution using imperative languages is too complicated, or is too difficult to adapt to changing underlying circumstances, then that business process remains manual, errors and inefficiencies in that business process remain undetected, and all the interactions between that process and the other processes in the firm suffer.
In contrast, when it is easy to describe a decision process using declarative constructs that are close to the business rules themselves, and easy to use an optimization engine to automate the computation of the solution, then more processes are automated, and the automation of each process enables gains that start from that very specific operation but then emanate throughout the firm.
Those network effects are benefits beyond the traditional, more immediate benefit of running the process itself in a more efficient manner.
While I am heartily embracing your sentiment, your general comments can also apply to SAS/IML, SAS/ETS, SAS/QC, and even SAS/AF. And if you want to go beyond that, they can also apply to E-miner, and on top of that, Text-miner, and one step further, SAS Sentiment Analysis, and we haven't touched many other great SAS solution servers. By the end of the day, it is all about cost of ownership vs potential risks, and where to set the line most time it is hard to calculate therefore rather subjective. While I am no where close to the pay grade to decide which product to buy, I can still see how it goes. When business feel confused/undecided, that is when they step on the brakes, and sadly to say, that is also when they resort to third party solutions, such as R.
"What happens if the number of players needed would change? What if you tried to scale this up?"
I can make a macro like SAS/OR did , that is not a big deal .
"What happens if the number of players needed would change? What if you tried to scale this up?"
That is depended on which algorithm you are using . I would like to know a better and faster algorithm for this question. It would be generous if you could display what exact algorithm SAS/OR is using by SAS data step or just explain it in English . Maybe I could end up a better and faster algorithm in near future.
As Bian said , SAS/OR need money to get it , and I also believe SAS/OR have the best algorithm in the world to process such PROGRAMMING . But not every company would like to pay for it . So data step or some other kind skill is an alternative way , although it would cost lots of time more than SAS/OR .But that is algorithm problem .
Xia Keshan
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.