Thanks for the example. I see that the scores are added with "multiplicities," i.e., player 1 has ScoreA=100, but they contribute their 100 "points" for each of their assignments, in this case for assignments G, H and I, resulting in a total contribution of 300 to the total of scoreA (706) in group 2.
There appears to be an inconsistency with ScoreC: In the final result it seems that ScoreC = ScoreA + ScoreB. However, if I apply the same rules as for ScoreA and ScoreB to ScoreC, I obtain 210 and 300 for the totals of ScoreC in the two groups, not 195 and 2118. Obviously, a sum of values from {15, 30} (see column ScoreC in your earlier post) can never be 2118.
I think i typed the data in wrong since it wouldn't allow me to cut and paste. ScoreC is not the sum of ScoreA+ScoreB, that is just a coincidence. 210 and 300 are the correct scores for ScoreC. I'm so sorry for the typo.
Thank you again for helping with this.
No problem. The extended macro shown below contains two simple additional PROC steps, the second of which creates a dataset TOTAL_SCORES which is eventually merged with the other group-level data to form the result dataset.
I assumed that you have the score data in the form of a dataset with one observation per player. But it would be easy to create such an input dataset from, e.g., the sample data in your earlier post, if necessary.
Macro ITER contains new parameters DSSC and SCVARS to keep the names of the score input dataset and the score variables flexible.
/* Create test data */
data scores;
input p ScoreA ScoreB ScoreC;
cards;
1 100 200 30
2 10 20 30
3 5 10 15
4 2 4 30
5 10 20 30
6 10 20 30
7 5 10 30
8 2 4 30
9 100 200 30
10 100 200 30
;
data have;
input a $ p;
cards;
A 2
B 2
C 3
D 3
E 4
F 4
G 1
H 1
I 1
A 5
B 6
C 6
D 7
E 8
F 9
G 9
H 10
I 10
;
/* Macro to perform the iterative algorithm */
%macro iter( dsin =have /* input dataset linking players and assignments */
, dssc =scores /* input dataset with scores for each player */
, scvars=ScoreA ScoreB ScoreC /* list of score variables */
, dsout=want /* output dataset */
, v1=a /* variable name for assignment IDs */
, v2=p /* variable name for player IDs */
, d1=_ /* delimiter for final list of assignment IDs */
, d2=%str(,) /* delimiter for final list of player IDs */
);
%local i n1 n2;
%let i=1;
proc sort data=&dsin presorted;
by &v1;
run;
/* Assign initial group numbers g */
data t1;
set &dsin;
by &v1;
g+first.&v1;
run;
proc sql noprint; /* The sum of all group numbers */
select sum(g) into :n1 /* is used to detect changes */
from t1; /* after reassigning groups. */
quit;
%if &n1>=%sysfunc(constant(EXACTINT))
%then %put %str(WAR)NING: Sum of group numbers has become too large. Results may be incorrect!;
%do %until(&n1=&n2);
%let i=%eval(3-&i); /* toggle i between 1 and 2 */
proc sql noprint; /* If one player has been assigned */
create table t&i as /* multiple group numbers, they are */
select &v1, &v2, min(g) as g /* replaced by their minimum. */
from t%eval(3-&i) /* In the next iteration the same */
group by &&v&i; /* procedure is applied to assign- */
/* ment IDs rather than player IDs. */
select sum(g) into :n&i /* And so on, alternating, until */
from t&i; /* no reassignments occur, i.e., t1 */
quit; /* and t2 are equal up to sort order */
%end; /* and hence &n1=&n2. */
%do i=1 %to 2; /* Sorted lists of distinct assign- */
proc sql; /* ment and player IDs are prepared. */
create table u&i as
select distinct g, &&v&i
from t&i;
quit;
data r&i; /* Finally, the delimited lists of */
do until(last.g); /* IDs are created for reporting. */
set u&i;
by g;
length &&v&i..list $32767; /* length might be reduced carefully */
&&v&i..list=catx("&&d&i", &&v&i..list, &&v&i);
end;
drop &&v&i;
run;
%end;
proc sql; /* Scores are retrieved */
create table s0 as
select x.&v2, g, %sysfunc(translate(%cmpres(&scvars),%str(,),%str( )))
from t2 x left join &dssc y
on x.&v2=y.&v2;
quit;
proc summary data=s0 nway; /* Scores are summed within groups */
class g;
var &scvars;
output out=total_scores(drop=_:) sum=;
run;
/* Create result dataset */
data &dsout;
merge r1 r2 total_scores;
by g;
drop g;
label &v1.list='Assignment ID'
&v2.list='Players';
run;
%mend iter;
%iter;
proc print data=want noobs label;
run;
I'll be back tomorrow (CEST).
Thank you so much!!!!!
Now they're telling me that they only want the score counted once for each player no matter how many assignments they fall under. Using the data sets from before the ending result should be as follows:
assignment player ScoreA ScoreB ScoreC
A_B_C_D 2,3,5,6,7 40 80 135
E_F_G_H_I 1,4,8,9,10 304 608 150
How do I only count the player's score once?
Thank you in advance for your help.
You're welcome.
You won't believe how easy it is to implement this latest change in the macro:
Change
from t2 x left join &dssc y
into
from u2 x left join &dssc y
in the last PROC SQL step of the macro. Yes, it's really a change of only one character.
So simple and yet i never would have figured it out. Thank you so much for your help. This works perfectly!
Dear FreelanceReinhard,
Ok, this is all above my head but I am slowly learning new things but I need more help. Now I have to distributed the combined scores as equally as possible across four weeks after grouping and summing scores. The attached file has the input and output files. All of the earlier rules apply.
Tables can have multiple unique players assigned to them. players can be in multiple tables.
I have to schedule all of the tables and players together. So if an table has 5 players they all need to be scheduled on January 1st. However, if those 5 players also fall into other tables, they need to be scheduled on January 1st as well. All players in an table and all tables with mutual players need to be scheduled at the same time.
The scores need to be counted once for each player no matter how many tables they fall under.
Thanks in advance for your help.
Hello @Krislynn742005,
This new task is independent of what we've done so far. The calculation of the total score (ScoreA+ScoreB+...) could be included in the existing macro (very easily). So, essentially the question is how to partition a set of n elements into four subsets such that the values assigned to the elements (i.e. the total scores) have as similar totals in the four subsets as possible.
To precisely answer this question we have to perform what is called combinatorial optimization -- a non-trivial exercise. There are different methods available, not all in Base SAS, though.
Questions:
proc setinit;
run;
I'll be back tomorrow (CEST).
Hi FreelanceReinhard,
I've checked and I do not have SAS/OR with my license. Sorry. 😞 I'm going to try and ask if we can get it.
Hi @Krislynn742005,
Thanks for checking. The background of my questions was: If n is too large to allow for a (so called) brute-force approach (i.e. to examine virtually all possible partitions, which could be done with Base SAS alone, in principle), we should apply a more sophisticated algorithm to find an optimal (or near-optimal) solution. I know from a couple of other threads in this forum that SAS/OR provides the best tools for such tasks. However, there's another set of strong tools available in SAS/IML: genetic algorithms. (Ksharp has delivered several solutions using those for other optimization problems.) In the "worst-case scenario" -- large n and only Base SAS available -- we could still develop a simple ad-hoc algorithm which may be good enough for your purpose. (I mean, you're not preparing a Ph.D. thesis, but just want to schedule some events.)
Edit: In any case we'll need the optimality criterion (question 1) to assess different partitions.
Good morning FreelanceReinhard,
The the exact criterion for "as equally as possible," is this: we want each week to have as close to the average number of tests for each Score. I've attached a file to show what I'm talking about. For example, the TotalScores across all Table/Player combinations for ScoreA, ScoreB, ScoreC and ScoreD are 67, 134, 201 and 268 respectively. The Average Scores are 17, 34, 50 and 67. In lines 13-16 of the attached file, you see each week and the scores assigned. Weeks 1 - 3 are pretty close to the actual averages. Week 4 however, is lower but I think that has to do with the small data set.
Does this make sense? All the scores need to be distributed "as equally as possible" not the total score.
Also, I checked and I do have SAS/IML
Thanks,
Krislynn
Hello Krislynn,
Many thanks for the clarification. I see that the "average scores" are simply the total scores divided by 4 (weeks) and rounded to integers. So, your problem is even more similar to that in the other recent thread Ksharp linked to (where three "scores" were involved rather than four) -- which means that Ksharp's solution from there (heavily using SAS/IML) will likely need only minor modifications.
Unfortunately, I don't have a SAS/IML license, nor sufficient experience with this SAS module. But @Ksharp already wrote "I can do that by IML," so I hope he will take a look. (Please note that it's now late at night in his Chinese time zone.) I will be happy to provide additional support, if needed, as far as my macro (ITER) is concerned. And I will give likes to Ksharp's solution as soon as you confirm that it works.
Still, I'm curious how big is n (the number of PlayerLists -- not the number of players) in your real data. I assume that you have already performed the grouping using macro ITER and thus determined n.
Hi FreelanceReinhard and KSharp,
After running the ITER, my 100,000+ unique table/player combinations the n comes in around 5,000. Thanks in advance to you both with assistance on this issue. I've only been using SAS for about 5 years and have no prior programming experience so I'm still learning a lot.
Thanks,
Krislynn
@Krislynn742005 wrote:
... n comes in around 5,000.
Thanks, this is interesting. According to an approximation formula, the number of partitions of a 5000-element set into 4 non-empty subsets is an integer with about 3,000 digits! So, a brute-force approach (with Base SAS alone) would be totally hopeless. Fortunately, you have a SAS/IML license.
Thank goodness I have SAS/IML. 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.