BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
FreelanceReinh
Jade | Level 19

Thanks for the example. I see that the scores are added with "multiplicities," i.e., player 1 has ScoreA=100, but they contribute their 100 "points" for each of their assignments, in this case for assignments G, H and I, resulting in a total contribution of 300 to the total of scoreA (706) in group 2.

 

There appears to be an inconsistency with ScoreC: In the final result it seems that ScoreC = ScoreA + ScoreB. However, if I apply the same rules as for ScoreA and ScoreB to ScoreC, I obtain 210 and 300 for the totals of ScoreC in the two groups, not 195 and 2118. Obviously, a sum of values from {15, 30} (see column ScoreC in your earlier post) can never be 2118.

Krislynn742005
Fluorite | Level 6

I think i typed the data in wrong since it wouldn't allow me to cut and paste.  ScoreC is not the sum of ScoreA+ScoreB, that is just a coincidence.  210 and 300 are the correct scores for ScoreC.  I'm so sorry for the typo.

 

Thank you again for helping with this.

FreelanceReinh
Jade | Level 19

No problem. The extended macro shown below contains two simple additional PROC steps, the second of which creates a dataset TOTAL_SCORES which is eventually merged with the other group-level data to form the result dataset.

 

I assumed that you have the score data in the form of a dataset with one observation per player. But it would be easy to create such an input dataset from, e.g., the sample data in your earlier post, if necessary.

 

Macro ITER contains new parameters DSSC and SCVARS to keep the names of the score input dataset and the score variables flexible.

/* Create test data */

data scores;
input p ScoreA ScoreB ScoreC;
cards;
1  100 200 30
2   10  20 30
3    5  10 15
4    2   4 30
5   10  20 30
6   10  20 30
7    5  10 30
8    2   4 30
9  100 200 30
10 100 200 30
;

data have;
input a $ p;
cards;
A  2 
B  2 
C  3 
D  3 
E  4 
F  4 
G  1 
H  1 
I  1 
A  5 
B  6 
C  6 
D  7 
E  8 
F  9 
G  9 
H 10 
I 10 
;

/* Macro to perform the iterative algorithm */

%macro iter( dsin =have   /* input dataset linking players and assignments  */
           , dssc =scores /* input dataset with scores for each player      */
           , scvars=ScoreA ScoreB ScoreC /* list of score variables         */
           , dsout=want   /* output dataset                                 */
           , v1=a         /* variable name for assignment IDs               */
           , v2=p         /* variable name for player IDs                   */
           , d1=_         /* delimiter for final list of assignment IDs     */
           , d2=%str(,)   /* delimiter for final list of player IDs         */
           );
%local i n1 n2;

%let i=1;

proc sort data=&dsin presorted;
by &v1;
run;

/* Assign initial group numbers g */

data t1;
set &dsin;
by &v1;
g+first.&v1;
run;

proc sql noprint;        /* The sum of all group numbers */
select sum(g) into :n1   /* is used to detect changes    */
from t1;                 /* after reassigning groups.    */
quit;

%if &n1>=%sysfunc(constant(EXACTINT))
%then %put %str(WAR)NING: Sum of group numbers has become too large. Results may be incorrect!;

%do %until(&n1=&n2);
  %let i=%eval(3-&i); /* toggle i between 1 and 2 */

  proc sql noprint;         /* If one player has been assigned   */
  create table t&i as       /* multiple group numbers, they are  */
  select &v1, &v2, min(g) as g  /* replaced by their minimum.    */
  from t%eval(3-&i)         /* In the next iteration the same    */
  group by &&v&i;           /* procedure is applied to assign-   */
                            /* ment IDs rather than player IDs.  */
  select sum(g) into :n&i   /* And so on, alternating, until     */
  from t&i;                 /* no reassignments occur, i.e., t1  */
  quit;                     /* and t2 are equal up to sort order */
%end;                       /* and hence &n1=&n2.                */

%do i=1 %to 2;              /* Sorted lists of distinct assign-  */
  proc sql;                 /* ment and player IDs are prepared. */
  create table u&i as
  select distinct g, &&v&i
  from t&i;
  quit;

  data r&i;                 /* Finally, the delimited lists of   */
  do until(last.g);         /* IDs are created for reporting.    */
    set u&i;
    by g;
    length &&v&i..list $32767; /* length might be reduced carefully */
    &&v&i..list=catx("&&d&i", &&v&i..list, &&v&i);
  end;
  drop &&v&i;
  run;
%end;

proc sql;                   /* Scores are retrieved              */
create table s0 as
select x.&v2, g, %sysfunc(translate(%cmpres(&scvars),%str(,),%str( )))
from t2 x left join &dssc y
on x.&v2=y.&v2;
quit;

proc summary data=s0 nway;  /* Scores are summed within groups   */
class g;
var &scvars;
output out=total_scores(drop=_:) sum=;
run;

/* Create result dataset */

data &dsout;
merge r1 r2 total_scores;
by g;
drop g;
label &v1.list='Assignment ID'
      &v2.list='Players';
run;
%mend iter;

%iter;

proc print data=want noobs label;
run;

I'll be back tomorrow (CEST).

Krislynn742005
Fluorite | Level 6

Thank you so much!!!!!

 

Now they're telling me that they only want the score counted once for each player no matter how many assignments they fall under.  Using the data sets from before the ending result should be as follows:

 

assignment           player           ScoreA           ScoreB           ScoreC

A_B_C_D             2,3,5,6,7       40                    80                 135

E_F_G_H_I          1,4,8,9,10     304             ​     608              150

 

How do I only count the player's score once?

 

Thank you in advance for your help.

FreelanceReinh
Jade | Level 19

You're welcome.

 

You won't believe how easy it is to implement this latest change in the macro:

 

Change

from t2 x left join &dssc y

into

from u2 x left join &dssc y

in the last PROC SQL step of the macro. Yes, it's really a change of only one character.

Krislynn742005
Fluorite | Level 6

So simple and yet i never would have figured it out.  Thank you so much for your help.  This works perfectly!

 

Krislynn742005
Fluorite | Level 6

Dear FreelanceReinhard,

 

Ok, this is all above my head but I am slowly learning new things but I need more help.  Now I have to distributed the combined scores as equally as possible across four weeks after grouping and summing scores.  The attached file has the input and output files.  All of the earlier rules apply.

 

Tables can have multiple unique players assigned to them.  players can be in multiple tables.

 

I have to schedule all of the tables and players together.  So if an table has 5 players they all need to be scheduled on January 1st.  However, if those 5 players also fall into other tables, they need to be scheduled on January 1st as well.  All players in an table and all tables with mutual players need to be scheduled at the same time. 

 

The scores need to be counted once for each player no matter how many tables they fall under. 

 

Thanks in advance for your help.

FreelanceReinh
Jade | Level 19

Hello @Krislynn742005,

 

This new task is independent of what we've done so far. The calculation of the total score (ScoreA+ScoreB+...) could be included in the existing macro (very easily). So, essentially the question is how to partition a set of n elements into four subsets such that the values assigned to the elements (i.e. the total scores) have as similar totals in the four subsets as possible.

 

To precisely answer this question we have to perform what is called combinatorial optimization -- a non-trivial exercise. There are different methods available, not all in Base SAS, though.

 

Questions:

  1. What is the exact criterion for "as equally as possible," i.e., how should the program decide which of two partitions is "more equal" than the other? Possible criteria include the variance of the four total scores (smaller=better), the maximum difference between two total scores (again, smaller=better) and others.
  2. How big is (the number of PlayerLists)? In your attached example n=8. There are 1701 different partitions of an 8-element set into 4 non-empty subsets (according to Wikipedia), but this number grows rapidly as n increases: e.g., it's 34105 for n=10 (ibid.).
  3. Does your SAS license include SAS/OR? The output (in the log) of PROC SETINIT will tell:
    proc setinit;
    run;
  4. Same question for SAS/IML (possibly listed as IML Studio).

I'll be back tomorrow (CEST).

Krislynn742005
Fluorite | Level 6

Hi FreelanceReinhard,

 

I've checked and I do not have SAS/OR with my license.  Sorry. 😞  I'm going to try and ask if we can get it.

FreelanceReinh
Jade | Level 19

Hi @Krislynn742005,

 

Thanks for checking. The background of my questions was: If n is too large to allow for a (so called) brute-force approach (i.e. to examine virtually all possible partitions, which could be done with Base SAS alone, in principle), we should apply a more sophisticated algorithm to find an optimal (or near-optimal) solution. I know from a couple of other threads in this forum that SAS/OR provides the best tools for such tasks. However, there's another set of strong tools available in SAS/IML: genetic algorithms. (Ksharp has delivered several solutions using those for other optimization problems.) In the "worst-case scenario" -- large n and only Base SAS available -- we could still develop a simple ad-hoc algorithm which may be good enough for your purpose. (I mean, you're not preparing a Ph.D. thesis, but just want to schedule some events.)

 

Edit: In any case we'll need the optimality criterion (question 1) to assess different partitions.

Krislynn742005
Fluorite | Level 6

Good morning FreelanceReinhard,

 

The the exact criterion for "as equally as possible," is this: we want each week to have as close to the average number of tests for each Score.  I've attached a file to show what I'm talking about.  For example, the TotalScores across all Table/Player combinations for ScoreA, ScoreB, ScoreC and ScoreD are 67, 134, 201 and 268 respectively.  The Average Scores are 17, 34, 50 and 67.  In lines 13-16 of the attached file, you see each week and the scores assigned.  Weeks 1 - 3 are pretty close to the actual averages.  Week 4 however, is lower but I think that has to do with the small data set.

 

Does this make sense?  All the scores need to be distributed "as equally as possible" not the total score.

 

Also, I checked and I do have SAS/IML

 

Thanks,

Krislynn

 

 

FreelanceReinh
Jade | Level 19

Hello Krislynn,

 

Many thanks for the clarification. I see that the "average scores" are simply the total scores divided by 4 (weeks) and rounded to integers. So, your problem is even more similar to that in the other recent thread Ksharp linked to (where three "scores" were involved rather than four) -- which means that Ksharp's solution from there (heavily using SAS/IML) will likely need only minor modifications.

 

Unfortunately, I don't have a SAS/IML license, nor sufficient experience with this SAS module. But @Ksharp already wrote "I can do that by IML," so I hope he will take a look. (Please note that it's now late at night in his Chinese time zone.) I will be happy to provide additional support, if needed, as far as my macro (ITER) is concerned. And I will give likes to Ksharp's solution as soon as you confirm that it works.

 

Still, I'm curious how big is n (the number of PlayerLists -- not the number of players) in your real data. I assume that you have already performed the grouping using macro ITER and thus determined n.

Krislynn742005
Fluorite | Level 6

Hi FreelanceReinhard and KSharp,

 

After running the ITER, my 100,000+ unique table/player combinations the n comes in around 5,000.  Thanks in advance to you both with assistance on this issue.  I've only been using SAS for about 5 years and have no prior programming experience so I'm still learning a lot.

 

Thanks,

Krislynn

FreelanceReinh
Jade | Level 19

@Krislynn742005 wrote:

... n comes in around 5,000.


Thanks, this is interesting. According to an approximation formula, the number of partitions of a 5000-element set into 4 non-empty subsets is an integer with about 3,000 digits! So, a brute-force approach (with Base SAS alone) would be totally hopeless. Fortunately, you have a SAS/IML license.

Krislynn742005
Fluorite | Level 6

Thank goodness I have SAS/IML. 🙂

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 51 replies
  • 2235 views
  • 6 likes
  • 6 in conversation