Help using Base SAS procedures

Need some help with grouping and summing.

Reply
Occasional Contributor
Posts: 6

Need some help with grouping and summing.

Hi all,

First post here, fairly new to SAS. Been lurking the forums for a while recently. Great website.

Was trying to get a little bit of help around something I'm trying to do on SAS.

I have a number of different activities (lets say hours of sport x played during a week) that different individuals are doing and am trying to find the sum of time done per sport over a period of time, lets say one year.

I'm not sure what the correct way to go about this is but I'm assuming a group by using SQL would be a good start.

In addition I'm trying to see if its possible to add some IF statements into that procedure. For example, if I want to start counting the sum of hours played after a certain date (say last week in the month or weekends) and if its less or greater than another variable say (days played in weekdays or average hours played) then flag players if they have gone either over or under that condition.

Many thanks in advance and hopefully my descriptions make sense!

Godspeed and have a great day.

Super User
Posts: 19,822

Re: Need some help with grouping and summing.

Posted in reply to cautious_steve

1) We'd need to know a bit more about how your data is structured, what are your variables and how are they formatted, date  or week for time?

2) There are many ways to do what you want, but from a first glance, I might suggest a proc format with a mulitlabel format and procedure such as proc means or tabulate. 

Occasional Contributor
Posts: 6

Re: Need some help with grouping and summing.

The identifier is structured in terms of the name of the player eg John Smith, each player may have several observations within the dataset each being for a time they trained (in numeric hours eg 4.5 or 5.0) or played a sport (eg sport 1-20).
There's a date variable in day month and year eg 12-03-2012, which records the day in which player x participated in sport y and for how long. The multiple observations per player basically specify the number of times they trained during a time period.

I'm interested in finding the total training time split by the player over the time period as well as if a player has overtrained or undertrained, and by how many hours. Theres a variable that specifies how many hours a player should train after a certain date eg. no more than 10 hours in total after day 20 of the month or after event x date from another variable. Literally want to begin counting the total sum of hours trained by the player after that certain date and to flag if he/she has gone over or under.

Hopefully it makes sense

SAS Super FREQ
Posts: 8,868

Re: Need some help with grouping and summing.

Posted in reply to cautious_steve

Hi:

  Generally, it's most helpful if you provide a program with a small sample of data that everybody can use. I really can't visualize the structure of your data except that you have multiple rows per person. It almost sounds to me that you have different data on different rows. Without seeing the data, it's hard to come up with a PROC SQL or a PROC REPORT or DATA step to do what you want.

    

  You don't say whether you want a report or an output dataset. Knowing the structure of the output or the type of output would be useful.

  

  Below is an example of a little program that creates a SAS dataset. I made up the variable names based on what I interpreted from your post. But you're the only one who knows what your data looks like and how you'll read it into SAS form for processing. I'm still not clear, for example on how your data has a variable that says how many hours a player should train after a certain date. Is it the same date for every player? So wouldn't there be at least 2 date variables -- the date variable they played a sport and then the date that they would train after? (as an example...)

cynthia

data sport_info;
  length name $20 sport_code $12;
  infile datalines dlm=',' dsd;
  input name $ hours_trained sport_code $ time_sport date_sport : mmddyy10. should_train_hours;
return;
datalines;
"John Smith",4,"running",8,12-03-2012,10
"John Smith",4,"cycling",4,12-05-2012,3
"Mary Jones",12,"skiing",2,12-03-2012,10
;
run;
  
ods html file='c:\temp\sportinfo.html';

proc print data=sport_info;
format date_sport mmddyyd10.;
run;
ods html close;

Occasional Contributor
Posts: 6

Re: Need some help with grouping and summing.

Posted in reply to Cynthia_sas

http://www.2shared.com/file/MLBNxfDv/Sportsdata.html

I've uploaded a copy of the data on Excel and also my idealized results of it as well as what I'm after. I've included some descriptions of what the data means as well as some intuition around what needs to be done.

Hopefully it makes sense. Cheers!

Occasional Contributor
Posts: 6

Re: Need some help with grouping and summing.

Posted in reply to cautious_steve

bump

Contributor
Posts: 29

Re: Need some help with grouping and summing.

Posted in reply to cautious_steve

could you just please paste example of data here? 5 lines with all collumns and we will try to help you.

Occasional Contributor
Posts: 6

Re: Need some help with grouping and summing.

Player NameEvent DateEvent TypeEvent NameEvent Amount
John Smith12/03/2012TrainSoccer5
John Smith17/03/2012TrainSoccer4.5
John Smith21/03/2012BurstSoccer3
Mary Weed1/03/2012BurstTennis2
Mary Weed7/03/2012TrainTennis4
James Wood2/03/2012TrainCross Country3
James Wood11/03/2012TrainCross Country3
James Wood13/03/2012BurstCross Country5
James Wood16/03/2012TrainCross Country3
James Wood16/03/2012TrainCross Country3
James Wood19/03/2012BurstSwimming10
James Wood20/03/2012TrainSwimming3
James Wood22/03/2012TrainSwimming2
Peter Mach7/03/2012TrainSoccer5
George Poll17/03/2012BurstIce Hockey8
George Poll20/03/2012TrainIce Hockey12
Occasional Contributor
Posts: 6

Re: Need some help with grouping and summing.

Posted in reply to cautious_steve
* Data explainations: Event date basically specifies when a task was completed or trigger is incurred (for Bursts etc.), train refers to the amount trained during that day (under event amount) and for what sport activity.
When Event Type is 'Burst', it refers to the amount of time needed to be trained by the athlete overall within 7 days of the event being triggered. 'Burst' does not refer to amount of time trained, its only a trigger mechanism
* Goals: To create a total hours trained variable and also to flag if athlete completed aimed training length for time period

My calculations should be like:

* Burst amounts shouldn’t be included* Has athlete met Burst target?*Amount of time trained in that sports discipline after burst triggerLast day, which summation should count towards meeting target
Player NameEvent DateEvent TypeEvent NameEvent AmountTotal Amount TrainedTarget FlagPost Burst TrainingBurst End Date
John Smith12/03/2012TrainSoccer5
John Smith17/03/2012TrainSoccer4.5
John Smith21/03/2012BurstSoccer39.5NO28/03/2012
Mary Weed1/03/2012BurstTennis208/03/2012
Mary Weed7/03/2012TrainTennis44YES
James Wood2/03/2012TrainCross Country3
James Wood11/03/2012TrainCross Country3
James Wood13/03/2012BurstCross Country520/03/2012
James Wood16/03/2012TrainCross Country3
James Wood16/03/2012TrainCross Country312YES6
James Wood19/03/2012BurstSwimming1026/03/2012
James Wood20/03/2012TrainSwimming3
James Wood22/03/2012TrainSwimming2
Peter Mach7/03/2012TrainSoccer510YES10
George Poll17/03/2012BurstIce Hockey824/03/2012
George Poll20/03/2012TrainIce Hockey1212YES12

Ideal results should be like:

* Data should be finalized in something like this form
Player NameTotal Amount TrainedTarget Flag
John Smith9.5NO
Mary Weed4YES
James Wood22YES
Peter Mach5YES
George Poll12YES

Hopefully this makes sense?

Contributor
Posts: 29

Re: Need some help with grouping and summing.

Posted in reply to cautious_steve

i didnt understand meaning of burst ant target flag as johny smith has burst if 3 and total training of 9.5 and it more than burst. Also, event dates for john smith are 12 17 and 21 so difference is more than 7 days so we do not include 12th day into burst?

however i will try to write some code example (may contain errors):

1. make data set that contains athletes, types and bursts:

data burst;

set original; if event_type = 'burst';

week = week('event_date'd, 'v'); /* makes value of which describes which week of the year is it so we can identify if there was enough training that week. you can modify that to other date by playing with sas date statemts */

burst_amount = event_amount; /* make value of needed training per week */

run;

2. make data set without burst and modify with week of the year:

data train;

set original; if event_type ^= 'burst';

week = week('event_date'd, 'v'); /* makes value of which describes which week of the year is it so we can group training hours by week */

run;

3. sum training hours:

proc sql;

create table training_total as

select name, event_type, week, sum(event_amount) as total_amount from train group by name, event_type, week;

quit;

4. get final result:

proc sql;

create table resuls as

select * from (

(select * from burst) s1

left join

(select * from training_total) s2

on s1.name = s2.name and s1.event_type = s2.event_type and s1.week = s2.week);

quit;

data result;

set result;

if total_amount >= burst_amount than target_flag = 'YES';    else target_flag = 'NO';

keep name event_type week total_amount target_flag;

run;

im sure there will be mistaces in code as im just puting it right here and havent tested it. is the concept clear to you?

Ask a Question
Discussion stats
  • 9 replies
  • 240 views
  • 0 likes
  • 4 in conversation