Re: Trying to get some means but really struggling. Please help!

hwangnyc · Posted 10-02-2018 12:01 PM

Hi everyone,

I have a dataset where I have the following variables

Date – the date of the event
Car – a specific car
Driver – name of driver
Client – name of client
Time – time of drive

Here is a sample of the data:

date	car	driver	client	time
1/1/2018	1	AE/ JG_CF/ RZ	Ventre	0:50:00
1/2/2018	1	TB/ AE_AE/ JG	Pettaway	1:00:00
1/2/2018	1	TB/ AE_AE/ JG	Jeter	0:42:00
1/2/2018	2	CC/ RZ_JV	Davidson	0:55:00
1/3/2018	1	CC/ RZ_FS	molina de valle	0:15:00
1/3/2018	1	CC/ RZ_FS	Gonzalez	0:43:00
1/3/2018	2	TB/ AE_JV	York	0:25:00
1/3/2018	2	TB/ AE_JV	Camara	0:15:00
1/3/2018	2	TB/ AE_JV	Camara	0:20:00
1/4/2018	1	TB/ LP_JG/ MP	Waters	0:30:00
1/4/2018	1	TB/ LP_JG/ MP	Ventre	0:25:00
1/4/2018	2	CC/ RZ_VO/ FS	Perez	2:23:00
1/5/2018	1	MG/ LP_AE/ JV	Gavila	0:40:00
1/5/2018	1	MG/ LP_AE/ JV	Velasquez	0:25:00
1/5/2018	1	MG/ LP_AE/ JV	Grande	1:10:00
1/5/2018	2	CC/ RZ_JG/ SV	Dublin	0:40:00
1/5/2018	2	CC/ RZ_JG/ SV	Leobold	1:50:00
1/8/2018	1	CC/RZ_FS/JV	Davidson	0:25:00
1/8/2018	1	CC/RZ_FS/JV	Glover	0:13:00
1/8/2018	2	AE/TB_SV	Givens	0:55:00
1/9/2018	1	AE/TB_AE/JG	Jones	1:18:00
1/9/2018	1	AE/TB_AE/JG	Grande	0:20:00
1/9/2018	1	AE/TB_AE/JG	Grande	0:22:00
1/9/2018	2	CC/RZ_FS/JV	Molina de Valle	0:28:00
1/10/2018	1	CF/MG_AE	Ventre	0:15:00
1/10/2018	2	CC/RZ_JV/SV	Gonzalez	0:40:00
1/10/2018	2	CC/RZ_JV/SV	Singh	0:35:00
1/10/2018	3	AE/TB_JG/AE	Rodriguez	0:20:00
1/10/2018	3	AE/TB_JG	Jeter	0:27:00
1/10/2018	3	AE/TB_JG	Jiminez	0:35:00

I would like to know the following:

What is the average time if a car has 1 trip in a day, 2 trips in a day, 3 trips in a day, etc?

Any suggestions as to how to proceed would be greatly appreciated.

Thank you!

PaigeMiller · Posted 10-02-2018 12:41 PM

UNTESTED CODE

proc summary data=have;
    class date car;
    var time;
    output out=have2 mean=;
run;

proc summary data=have2(rename=(_freq_=freq));
    class freq;
    var time;
    output out=want mean=;
run;

--
Paige Miller

hwangnyc · Posted 10-02-2018 01:02 PM

Thanks Paige!

That gave me this output:

freq	_TYPE_	_FREQ	Time
.	0	700	0:41:01
1	1	104	0:56:18
2	1	263	0:42:15
3	1	122	0:32:58
4	1	75	0:36:01
5	1	35	0:40:46
6	1	35	0:35:57
7	1	41	0:37:52
8	1	14	0:31:02
9	1	5	0:27:59
10	1	2	0:28:12
206	1	1	0:40:46
414	1	1	0:41:48
451	1	1	0:33:40
1072	1	1	0:38:09

I'm not quite sure what to make of this?

PaigeMiller · Posted 10-02-2018 03:34 PM

First, you can ignore the row where _type_=0

When freq=1, these are the situations where a car was driven only once in a day, and the value under TIME is the mean of all those times for cars driven once per day..

When freq=2, these are the situations where a car was driven twice in a day, and the value under TIME is the mean of the times when it was driven twice in a day

And so on

Please run this minor modification to the code:

proc summary data=have nway;
    class date car;
    var time;
    output out=have2 mean=;
run;

proc summary data=have2(rename=(_freq_=freq)) nway;
    class freq;
    var time;
    output out=want mean=;
run;

--
Paige Miller

ballardw · Posted 10-02-2018 01:13 PM

@hwangnyc wrote:

Hi everyone,

I have a dataset where I have the following variables

Date – the date of the event
Car – a specific car
Driver – name of driver
Client – name of client
Time – time of drive

Here is a sample of the data:

date

car

driver

client

time

1/1/2018

1

AE/ JG_CF/ RZ

Ventre

0:50:00

1/2/2018

1

TB/ AE_AE/ JG

Pettaway

1:00:00

1/2/2018

1

TB/ AE_AE/ JG

Jeter

0:42:00

1/2/2018

2

CC/ RZ_JV

Davidson

0:55:00

1/3/2018

1

CC/ RZ_FS

molina de valle

0:15:00

1/3/2018

1

CC/ RZ_FS

Gonzalez

0:43:00

1/3/2018

2

TB/ AE_JV

York

0:25:00

1/3/2018

2

TB/ AE_JV

Camara

0:15:00

1/3/2018

2

TB/ AE_JV

Camara

0:20:00

1/4/2018

1

TB/ LP_JG/ MP

Waters

0:30:00

1/4/2018

1

TB/ LP_JG/ MP

Ventre

0:25:00

1/4/2018

2

CC/ RZ_VO/ FS

Perez

2:23:00

1/5/2018

1

MG/ LP_AE/ JV

Gavila

0:40:00

1/5/2018

1

MG/ LP_AE/ JV

Velasquez

0:25:00

1/5/2018

1

MG/ LP_AE/ JV

Grande

1:10:00

1/5/2018

2

CC/ RZ_JG/ SV

Dublin

0:40:00

1/5/2018

2

CC/ RZ_JG/ SV

Leobold

1:50:00

1/8/2018

1

CC/RZ_FS/JV

Davidson

0:25:00

1/8/2018

1

CC/RZ_FS/JV

Glover

0:13:00

1/8/2018

2

AE/TB_SV

Givens

0:55:00

1/9/2018

1

AE/TB_AE/JG

Jones

1:18:00

1/9/2018

1

AE/TB_AE/JG

Grande

0:20:00

1/9/2018

1

AE/TB_AE/JG

Grande

0:22:00

1/9/2018

2

CC/RZ_FS/JV

Molina de Valle

0:28:00

1/10/2018

1

CF/MG_AE

Ventre

0:15:00

1/10/2018

2

CC/RZ_JV/SV

Gonzalez

0:40:00

1/10/2018

2

CC/RZ_JV/SV

Singh

0:35:00

1/10/2018

3

AE/TB_JG/AE

Rodriguez

0:20:00

1/10/2018

3

AE/TB_JG

Jeter

0:27:00

1/10/2018

3

AE/TB_JG

Jiminez

0:35:00

I would like to know the following:

What is the average time if a car has 1 trip in a day, 2 trips in a day, 3 trips in a day, etc?

Any suggestions as to how to proceed would be greatly appreciated.

Thank you!

Did you have starting data where it was one time per driver per day before making the (very difficult to use consistently driver mashup variable)?

If so

proc means data=have mean;

class car date;

var time;

run;

Note that no where in your example data is actually a "number of trips per day", which you did have shown a couple of ways to get in

https://communities.sas.com/t5/New-SAS-User/Reshaping-data-from-long-to-wide-using-counts/m-p/500552

but you have chosen possibly the most awkward way of approaching this problem.

Trying to get some means but really struggling. Please help!