BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ciaranhillery
Calcite | Level 5

I'm working with a team for the upcoming season, and have been tasked with compiling statistics tracking data after every session (training and/or game), and creating cumulative variables based on the data. In addition to creating the cumulative variables, I need to append the relevant variables from the previous 3 sessions that each player has attended, in order to track progression.

 

The variables that I need to keep from each sessions are called : Distance, Sprint_Distance & Top_Speed. The session data is extracted in CSV format for any player that has attended, and I append each sessions data to a master session dataset, which I need to be in order to create a count of how many sessions each player has attended. When reading in the session data, I append a numeric value to the end of each field, which indicates the session number. So session 1 will have field names:

 

Distance_1

Sprint_Distance_1

Top_Speed_1

 

Session 2 will have '_2' appended at the end, session 3 has '_3' appended and so on for each subsequent session.

 

While I've created the cumulative variables using the MEAN function, I'm having trouble extracting the data from the previous 3 sessions for each player. What I've tried is the following, extracting the last record and then deleting it so the next step will not extract the same record.

 

 

data previous1
     sessions1;
set path.raw_session_data;
by Player_Name;
if last.Player_Name then output previous1;
if last.Player_Name then delete;
output sessions1;
run;

 

data previous2
     sessions2;
set sessions1;
by Player_Name;
if last.Player_Name then output previous2;
if last.Player_Name then delete;
output sessions2;
run;

 

data previous3
     sessions3;
set sessions2;
by Player_Name;
if last.Player_Name then output previous3;
if last.Player_Name then delete;
output sessions3;
run;

 

data prev_sessions;
merge previous1
      previous2
      previous3;
by Player_Name;
run;

 

 

What I'm looking to do is to keep only the variables where data is available when identifying the last 3 sessions attended, instead of appending all session variables to each of the previous1-3 datasets being created. The problem is that I don't know the variable names that will be extracted, as players won't have attended every session so far. For example, player 1 has attended sessions 1, 2, 4 and 6, player 2 has attended sessions 1, 3, 5 and 6, player 3 has attended all sessions 1, 2, 3, 4, 5, 6 etc. so these numbers will be appended to the variable names as outlined previously.

 

I will then need to rename the variables to something like :

 

Previous_Distance_1 for the most recent previous session;

Previous_Distance_2 for the second most recent previous session;

Previous_Distance_3 for the third most recent previous session;

 

Hopefully I'm making sense!! Any help will be greatly appreciated!!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Just make a vertical dataset instead.  Assuming that you already have the data in this format where the variable names in each dataset is different then here is a quick way to get it into a more normal structure.  This assumes that the individual datasets are already sorted by player_name.

data want ;
  length Player_Name $30 session_no Distance Sprint_Distance Top_Speed 8;
  set session_1 - session_6 ;
  by player_name;
  if first.player_name then session_no=0;
  session_no+1;
  distance = coalesce(of distance_:);
  sprint_distance = coalesce(of sprint_distance_:);
  top_speed = coalesce(of top_speed_:);
  keep player_name -- top_speed;
run;

View solution in original post

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Show what the output you want from that test data is.

 

 

ciaranhillery
Calcite | Level 5

Due to restrictions with the connection where I am working from at present, I'm unable to create the dataset using the linked instructions. Below is a manual creation of the input session data - hope this works!

 

data WORK.CLASS(label='Session Data');
   infile datalines dsd truncover;
   input Name:$10. Distance_1:$10. Sprint_Distance_1:$10. Top_Speed_1:$10.;
datalines;

Player1 5.5397 1259.38 8.009

Player2 5.9846 1009.018 7.5334

Player3 6.1655 913.654 7.7067

Player4 6.2424 881.055 7.9001

Player5 6.1148 1010.969 8.089

;;;;

 

This will be repeated for subsequent sessions, but if any of players 1-5 didn't attend, then no data will be appended for them.

 

When trying to extract the last 3 sessions attended per player, the steps listed on the initial steps will append all fields to date i.e. Distance_1-6 etc. I only need to keep the final 3 that are populated.

 

What I'm ultimately looking for would be something like:

 

Player NameNumber of Sessions attendedAverage Distance (km)Previous Distance 1Previous Distance 2Previous Distance 3Average Sprint Distance (m)Previous Sprint Distance 1Previous Sprint Distance 2Previous Sprint Distance 3Average Top SpeedPrevious Top Speed 1Previous Top Speed 2Previous Top Speed 3
Player156.44056ABC1350.05DEF8.01428GHI
Kurt_Bremser
Super User

Maxim 19: Don't keep Data in Structure

Which means: don't put the session number in column names. Use the same three columns throughout, and put the session number in it's own column. This "long" dataset will be much easier to work with.

ciaranhillery
Calcite | Level 5

How would this work when I need to merge the needed variables together, and then ultimately trying to create the averages?

Kurt_Bremser
Super User

@ciaranhillery wrote:

How would this work when I need to merge the needed variables together, and then ultimately trying to create the averages?


You use by-group processing in the relevant procedures. The "wide" format prevents this effectively, and is therefore not used for processing, only for display when data is prepared for human consumption.

See this as a short example:

data WORK.CLASS1;
input Name :$10. Distance Sprint_Distance_1 Top_Speed_1;
datalines;
Player1 5.5397 1259.38 8.009
Player2 5.9846 1009.018 7.5334
Player3 6.1655 913.654 7.7067
Player4 6.2424 881.055 7.9001
Player5 6.1148 1010.969 8.089
;
run;

data WORK.CLASS2;
input Name :$10. Distance Sprint_Distance Top_Speed;
datalines;
Player1 5.5397 1259.38 8.009
Player2 5.9846 1009.018 7.5334
Player3 6.1655 913.654 7.7067
Player5 6.1148 1010.969 8.089
;
run;

data WORK.CLASS3;
input Name :$10. Distance Sprint_Distance Top_Speed;
datalines;
Player1 5.5397 1259.38 8.009
Player2 5.9846 1009.018 7.5334
Player4 6.2424 881.055 7.9001
Player5 6.1148 1010.969 8.089
;
run;

data all;
set work.class: indsname=inds;
session = input(substr(scan(inds,2,'.'),6),best.);
run;

proc sort data=all;
by name;
run;

proc means data=all mean;
by name;
var Distance Sprint_Distance_1 Top_Speed_1;
output out=want mean()=;
run;
Tom
Super User Tom
Super User

Just make a vertical dataset instead.  Assuming that you already have the data in this format where the variable names in each dataset is different then here is a quick way to get it into a more normal structure.  This assumes that the individual datasets are already sorted by player_name.

data want ;
  length Player_Name $30 session_no Distance Sprint_Distance Top_Speed 8;
  set session_1 - session_6 ;
  by player_name;
  if first.player_name then session_no=0;
  session_no+1;
  distance = coalesce(of distance_:);
  sprint_distance = coalesce(of sprint_distance_:);
  top_speed = coalesce(of top_speed_:);
  keep player_name -- top_speed;
run;
ciaranhillery
Calcite | Level 5

Thanks Tom, I've been able to work around this code to get what I need.

 

Thank you to everyone for getting back to me!

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1604 views
  • 0 likes
  • 4 in conversation