I have a file of several hundred backgammon matches and I want to use SAS to calculate a rating for
each player, prior to each match, based on their past performance. I have the date, the names of each
player, the outcome for black (1=Win for Black, 0=Win for White) and the number of points in
each match. The file looks like this…
==============================================================================
Date Black White W/L Match Length Black_Rating White_Rating
==============================================================================
1/1/2001 Frank John 1 7 100 100
1/2/2001 Bill Jeff 0 5 100 100
1/4/2001 Jeff Bill 1 3 101 99
1/5/2001 John Frank 0 5 99 101
1/6/2001 Bill John 0 9 100 98
Etc…..
I want to calculate a modified version of the Elo system and start everyone’s rating at 100.
So Frank and John have never played so both rating would be 100 and the same for the next row containing Bill and
Jeff’s match. For simplicity, let’s say if you win a match you add 1 to your score and subtract 1 for a loss; so, row 3
would read 101 and 99 respectively.
My problem is I want SAS to recognize who the two opponents are, and put their current ratings in the corresponding
columns then based on the outcome of the match, calculate and remember the new rating.
I’ve tried using RETAIN but my problem arises from the fact that any person can be in the black or white column.
Any ideas?
Thanks,
John
If your pool of player is reasonably small to divide into cases, you can use a _temporary_ array to retain the latest rating result of each player and update it & your data step variables at the same time.
Alternatively, to avoid having to know what array index refers to what player, you can use hash tables to do the same and simply search the hash by player name with the last value.
data want;
if _n_=1 then do;
declare hash h();
h.defineKey('name');
h.defineData('rating');
h.defineDone;
end;
set have;
rc=h.find(name: black);
if rc NE 0 then do;
name=black;
rating=100;
black_rating=rating;
h.add();
end; else do;
black_rating=rating;
if win_loss=1 then rating=rating+1;
else rating=rating-1;
h.replace();
end;
rc=h.find(name: white);
if rc NE 0 then do;
name=white
rating=100;
white_rating=rating;
h.add();
end; else do;
white_rating=rating;
if win_loss=1 then rating=rating+1;
else rating=rating-1;
h.replace()
end;
drop rc rating;
run;
This is untested but if you want to provide a small sample dataset through a datalines statement in the forums with the appropriate variable names that you use, I can edit to achieve your desired result. If you are not familiar with hash tables, they're just lookup tables. The find() method returns 0 if the key was found in the table and some error code other wise hence if rc NE 0 I'm initiating their rating to 100. Since you wanted the current record to list their rating prior to the match, the black_rating=rating; statement comes before the update to the rating variable.
Unless defined for multidata, hash tables can only retain one record per key. Thus, you must use .add method to add a new key/data pair to the table yet must use replace() method to update the data associated with an existing key.
Vincent
P.S. This should also be more efficient in terms of processing speed since the hash keeps everything in memory but since you only have a few hundred games it is somewhat irrelevant
Change your data structure to be one person per line, so one match is on two lines
Date | Black | White | W/L | Match Length | Black_Rating | White_Rating |
to
Date Player Colour Win Rating Length
Transpose will not work in this context.
You would need to sort by player and date. It helps to have a 1 / -1 Score instead of 1 / 0 Win/Loss. Here is my suggestion (based on @Reeza);
Data Players;
Length Player $32 ;
Set Have ;
Retain Rating 0 ; /* Place holder */
Keep Player Date Colour Score Match_Length Rating ;
Player = White ;
Colour = 'White' ;
Score = 2 * W_L -1 ;
Output ;
Player = Black ;
Colour = 'Black' ;
Score = - Score ;
Output ;
Run ;
Proc Sort
Data = Players ;
By Player Date ;
Run ;
Data Players ;
Set Players ;
By Player Date ;
Prev_Rating = LAG (Rating) ;
If First.Player Then Rating = 100 ;
Else Rating = SUM (Prev_Rating, Score) ; /* Do not use LAG() in this statement */
Drop Prev_Rating ;
Run ;
To get back to the original format (assuming only one game played on each date, otherwise you will have to introduce a Game sequence column for multiple games per date)
Proc Sort Data = Players ;
By Date Colour ;
Run ;
Data Want ;
Set Have (Obs = 0) ;
Retain Black White W_L Match_Length Black_Rating White_Rating ;
Set Players ;
Black = Player ;
Black_Rating = Rating ;
W_L = (Score + 1) / 2 ;
Set Players ;
White = Player ;
White_ rating = Rating ;
Output ;
Run ;
Alert - untested code ;
Richard
Alternately for the last step you could do a merge of the data on itself :
Proc SQL ;
Create table Have as
Select b.Date
, b.player as Black_Player
, w.player as White_Player
, (b.Score + 1) /2 As W_L
, b.Match_Length
, b.rating as Black_Rating
, w.Rating as White_Rating
From Players (where = (colour = 'Black')) b
, Players (where = (Colour = 'White)) w
Where w.date = b.date
Order by date
;
Quit ;
Whatever
Richard
If your pool of player is reasonably small to divide into cases, you can use a _temporary_ array to retain the latest rating result of each player and update it & your data step variables at the same time.
Alternatively, to avoid having to know what array index refers to what player, you can use hash tables to do the same and simply search the hash by player name with the last value.
data want;
if _n_=1 then do;
declare hash h();
h.defineKey('name');
h.defineData('rating');
h.defineDone;
end;
set have;
rc=h.find(name: black);
if rc NE 0 then do;
name=black;
rating=100;
black_rating=rating;
h.add();
end; else do;
black_rating=rating;
if win_loss=1 then rating=rating+1;
else rating=rating-1;
h.replace();
end;
rc=h.find(name: white);
if rc NE 0 then do;
name=white
rating=100;
white_rating=rating;
h.add();
end; else do;
white_rating=rating;
if win_loss=1 then rating=rating+1;
else rating=rating-1;
h.replace()
end;
drop rc rating;
run;
This is untested but if you want to provide a small sample dataset through a datalines statement in the forums with the appropriate variable names that you use, I can edit to achieve your desired result. If you are not familiar with hash tables, they're just lookup tables. The find() method returns 0 if the key was found in the table and some error code other wise hence if rc NE 0 I'm initiating their rating to 100. Since you wanted the current record to list their rating prior to the match, the black_rating=rating; statement comes before the update to the rating variable.
Unless defined for multidata, hash tables can only retain one record per key. Thus, you must use .add method to add a new key/data pair to the table yet must use replace() method to update the data associated with an existing key.
Vincent
P.S. This should also be more efficient in terms of processing speed since the hash keeps everything in memory but since you only have a few hundred games it is somewhat irrelevant
I read up on Hash tables. Very interesting. I modified your code slightly and it worked. Thanks for your help.
-John
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.