DATA Step, Macro, Functions and more

How to create a function to be used on aggregate data within a column by a group?

Reply
New Contributor
Posts: 4

How to create a function to be used on aggregate data within a column by a group?

[ Edited ]

I have two questions. First being how do I create a function in SAS/SQL that takes in three numbers and outputs a number to be stored in a variable. See question below, and then chart for clarification.

Second question, how do I apply the aforementioned function on the current observation variable (in a datastep) onto every other value in a column WITHIN a by group.

Please see the "chart"  below for a clearer picture.

 

This is (approximately) what my dataset looks like.

RaceNum        PlayerID       Rank     OldScore      NewScore

  Race1         25909                1             90                   . -> avg(F(90, 87, 1), F(90, 83, 1), ..., F(90, 66, 1))

  Race1         34391                2             87                   . -> avg(F(87, 90, 0), F(87, 83, 1), ..., F(87, 66, 1))

  Race1         38971                3             83                   . -> avg(F(83, 90, 0), F(83, 87, 0), ..., F(83, 66, 1))

     ...                  ...                  ...             ...                   . -> ...

  Race1         23023                8             66                   .->  avg(F(66, 90, 0), F(66, 87, 0), ..., F(66, 73, 0))

  Race2         44919                1             91                   . -> ...

     ...                 ...                   ...             ...                    . -> ...

  Race 2        25921               16            77                   . -> ...

    ...                 ...                   ...              ...                    . -> ...

Race1276      35901              30            53                   .

 

%F(athlete, opponent, winner) = *Where F is some user defined function. winner= 1 if opponent loses and otherwise = 1;

 

I was thinking of somehow transposing the dataset to get something like this:

 

RaceNum        PlayerID     Rank   OldScore   NewScore    Rank1Score     Rank2Score     Rank3Score ... Rank8Score  R16Sc

  Race1             25909              1       90              .                      90                      87                   83                   66

  Race1             34391              2       87              .                      90                      87                   83                   66

     ...                    ...                  ..        ..

  Race2             44919              1       91                                     91                       ...                                            ...             53

    ...

I could do this by possibly by having 50 if statements (largest number of racers in a race) 

 

 

data test1;
    set test;
    if Rank=1 then Rank1Score = OldScore;     else if Rank=1 then Rank2Score = Oldscore;                                  ....     else if Rank=50 then Rank50Score = OldScore; run;

And then use SQL:

 

 

proc sql;
    create table test2 as
    select *, max(Rank1Score) as rnk1score, ...., max(Rank50Score) as rnk50score
    from test1
    group by RaceNum;
quit;

 

 

Or using retain and lag statements and then left joining the last entry in each race using SQL.

 

But even then, I am not sure how to go about Rank&NScore Columns in SAS...

 

tl;dr: I am stuck.

 

I appreciate any questions, ideas, criticisms, pointers, or even solutions Smiley Wink.

 

Thank you.

 

 

Super User
Posts: 17,730

Re: How to create a function to be used on aggregate data within a column by a group?

Are you required to use SQL? 

And what are you trying to do in plain language? 

Super User
Super User
Posts: 7,392

Re: How to create a function to be used on aggregate data within a column by a group?

Would be a good idea if you clarified your question.  Post some simple test data - in the form of a datastep - and what the output should look like.  From that mass of text I gather than you are transposing your data up - unless its for an output file I would recommend not doing that as it will just make your coding more difficult.

Anyways, you can simply achieve what you want by:

/* assumes sorted! */
proc transpose data=have out=want prefix=rankscore;
  by racenum playerid;
  var oldscore;
  id rank;
  idlabel rank;
run;

Note that this will make your variables rankscoreX, not as you suggested as it is far easier to work with variable names with the numeric suffix.

New Contributor
Posts: 4

Re: How to create a function to be used on aggregate data within a column by a group?

Thank you! I transposed the data as you showed, and managed to get the scores by race in a row.
Ask a Question
Discussion stats
  • 3 replies
  • 168 views
  • 1 like
  • 3 in conversation