Hi everyone,
I have a dataset where I have the following variables
Date – the date of the event
Car – a specific car
Driver – name of driver
Client – name of client
Time – time of drive
Here is a sample of the data:
date | car | driver | client | time |
1/1/2018 | 1 | AE/ JG_CF/ RZ | Ventre | 0:50:00 |
1/2/2018 | 1 | TB/ AE_AE/ JG | Pettaway | 1:00:00 |
1/2/2018 | 1 | TB/ AE_AE/ JG | Jeter | 0:42:00 |
1/2/2018 | 2 | CC/ RZ_JV | Davidson | 0:55:00 |
1/3/2018 | 1 | CC/ RZ_FS | molina de valle | 0:15:00 |
1/3/2018 | 1 | CC/ RZ_FS | Gonzalez | 0:43:00 |
1/3/2018 | 2 | TB/ AE_JV | York | 0:25:00 |
1/3/2018 | 2 | TB/ AE_JV | Camara | 0:15:00 |
1/3/2018 | 2 | TB/ AE_JV | Camara | 0:20:00 |
1/4/2018 | 1 | TB/ LP_JG/ MP | Waters | 0:30:00 |
1/4/2018 | 1 | TB/ LP_JG/ MP | Ventre | 0:25:00 |
1/4/2018 | 2 | CC/ RZ_VO/ FS | Perez | 2:23:00 |
1/5/2018 | 1 | MG/ LP_AE/ JV | Gavila | 0:40:00 |
1/5/2018 | 1 | MG/ LP_AE/ JV | Velasquez | 0:25:00 |
1/5/2018 | 1 | MG/ LP_AE/ JV | Grande | 1:10:00 |
1/5/2018 | 2 | CC/ RZ_JG/ SV | Dublin | 0:40:00 |
1/5/2018 | 2 | CC/ RZ_JG/ SV | Leobold | 1:50:00 |
1/8/2018 | 1 | CC/RZ_FS/JV | Davidson | 0:25:00 |
1/8/2018 | 1 | CC/RZ_FS/JV | Glover | 0:13:00 |
1/8/2018 | 2 | AE/TB_SV | Givens | 0:55:00 |
1/9/2018 | 1 | AE/TB_AE/JG | Jones | 1:18:00 |
1/9/2018 | 1 | AE/TB_AE/JG | Grande | 0:20:00 |
1/9/2018 | 1 | AE/TB_AE/JG | Grande | 0:22:00 |
1/9/2018 | 2 | CC/RZ_FS/JV | Molina de Valle | 0:28:00 |
1/10/2018 | 1 | CF/MG_AE | Ventre | 0:15:00 |
1/10/2018 | 2 | CC/RZ_JV/SV | Gonzalez | 0:40:00 |
1/10/2018 | 2 | CC/RZ_JV/SV | Singh | 0:35:00 |
1/10/2018 | 3 | AE/TB_JG/AE | Rodriguez | 0:20:00 |
1/10/2018 | 3 | AE/TB_JG | Jeter | 0:27:00 |
1/10/2018 | 3 | AE/TB_JG | Jiminez | 0:35:00 |
I would like to know the following:
What is the average time if a car has 1 trip in a day, 2 trips in a day, 3 trips in a day, etc?
Any suggestions as to how to proceed would be greatly appreciated.
Thank you!
UNTESTED CODE
proc summary data=have;
class date car;
var time;
output out=have2 mean=;
run;
proc summary data=have2(rename=(_freq_=freq));
class freq;
var time;
output out=want mean=;
run;
Thanks Paige!
That gave me this output:
freq | _TYPE_ | _FREQ | Time |
. | 0 | 700 | 0:41:01 |
1 | 1 | 104 | 0:56:18 |
2 | 1 | 263 | 0:42:15 |
3 | 1 | 122 | 0:32:58 |
4 | 1 | 75 | 0:36:01 |
5 | 1 | 35 | 0:40:46 |
6 | 1 | 35 | 0:35:57 |
7 | 1 | 41 | 0:37:52 |
8 | 1 | 14 | 0:31:02 |
9 | 1 | 5 | 0:27:59 |
10 | 1 | 2 | 0:28:12 |
206 | 1 | 1 | 0:40:46 |
414 | 1 | 1 | 0:41:48 |
451 | 1 | 1 | 0:33:40 |
1072 | 1 | 1 | 0:38:09 |
I'm not quite sure what to make of this?
First, you can ignore the row where _type_=0
When freq=1, these are the situations where a car was driven only once in a day, and the value under TIME is the mean of all those times for cars driven once per day..
When freq=2, these are the situations where a car was driven twice in a day, and the value under TIME is the mean of the times when it was driven twice in a day
And so on
Please run this minor modification to the code:
proc summary data=have nway;
class date car;
var time;
output out=have2 mean=;
run;
proc summary data=have2(rename=(_freq_=freq)) nway;
class freq;
var time;
output out=want mean=;
run;
@hwangnyc wrote:
Hi everyone,
I have a dataset where I have the following variables
Date – the date of the event
Car – a specific car
Driver – name of driver
Client – name of client
Time – time of driveHere is a sample of the data:
date
car
driver
client
time
1/1/2018
1
AE/ JG_CF/ RZ
Ventre
0:50:00
1/2/2018
1
TB/ AE_AE/ JG
Pettaway
1:00:00
1/2/2018
1
TB/ AE_AE/ JG
Jeter
0:42:00
1/2/2018
2
CC/ RZ_JV
Davidson
0:55:00
1/3/2018
1
CC/ RZ_FS
molina de valle
0:15:00
1/3/2018
1
CC/ RZ_FS
Gonzalez
0:43:00
1/3/2018
2
TB/ AE_JV
York
0:25:00
1/3/2018
2
TB/ AE_JV
Camara
0:15:00
1/3/2018
2
TB/ AE_JV
Camara
0:20:00
1/4/2018
1
TB/ LP_JG/ MP
Waters
0:30:00
1/4/2018
1
TB/ LP_JG/ MP
Ventre
0:25:00
1/4/2018
2
CC/ RZ_VO/ FS
Perez
2:23:00
1/5/2018
1
MG/ LP_AE/ JV
Gavila
0:40:00
1/5/2018
1
MG/ LP_AE/ JV
Velasquez
0:25:00
1/5/2018
1
MG/ LP_AE/ JV
Grande
1:10:00
1/5/2018
2
CC/ RZ_JG/ SV
Dublin
0:40:00
1/5/2018
2
CC/ RZ_JG/ SV
Leobold
1:50:00
1/8/2018
1
CC/RZ_FS/JV
Davidson
0:25:00
1/8/2018
1
CC/RZ_FS/JV
Glover
0:13:00
1/8/2018
2
AE/TB_SV
Givens
0:55:00
1/9/2018
1
AE/TB_AE/JG
Jones
1:18:00
1/9/2018
1
AE/TB_AE/JG
Grande
0:20:00
1/9/2018
1
AE/TB_AE/JG
Grande
0:22:00
1/9/2018
2
CC/RZ_FS/JV
Molina de Valle
0:28:00
1/10/2018
1
CF/MG_AE
Ventre
0:15:00
1/10/2018
2
CC/RZ_JV/SV
Gonzalez
0:40:00
1/10/2018
2
CC/RZ_JV/SV
Singh
0:35:00
1/10/2018
3
AE/TB_JG/AE
Rodriguez
0:20:00
1/10/2018
3
AE/TB_JG
Jeter
0:27:00
1/10/2018
3
AE/TB_JG
Jiminez
0:35:00
I would like to know the following:
What is the average time if a car has 1 trip in a day, 2 trips in a day, 3 trips in a day, etc?
Any suggestions as to how to proceed would be greatly appreciated.
Thank you!
Did you have starting data where it was one time per driver per day before making the (very difficult to use consistently driver mashup variable)?
If so
proc means data=have mean;
class car date;
var time;
run;
Note that no where in your example data is actually a "number of trips per day", which you did have shown a couple of ways to get in
https://communities.sas.com/t5/New-SAS-User/Reshaping-data-from-long-to-wide-using-counts/m-p/500552
but you have chosen possibly the most awkward way of approaching this problem.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.