BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

Hello,

I have a dataset where each participant had a program installed on their phones for several days (long format). The program measured the person's usage of a certain phone app in Duration in seconds (Duration_s) several times a day per person. Each row indicates the participant ID (Ppt_ID), the start datetime of the app usage (Start), the end datetime of the app usage (End), and Duration_s. See below the code:

 

Data Duration;
Input Ppt_ID Start :DATETIME16. End :DATETIME16. Duration_s;
format Start DATETIME16. End DATETIME16.;
datalines;
1	26May20:17:00:47	26May20:17:00:47	0
1	26May20:17:00:47	26May20:17:01:11	24
1	26May20:19:38:10	26May20:19:38:10	0
1	26May20:19:38:10	26May20:19:38:12	2
1	26May20:20:08:41	26May20:20:09:51	70
1	26May20:20:09:51	26May20:20:10:19	28
1	26May20:20:28:25	26May20:20:28:26	1
1	26May20:20:28:26	26May20:20:28:41	15
1	26May20:20:44:08	26May20:20:44:11	3
1	26May20:21:55:30	26May20:21:55:32	2
1	26May20:21:59:03	26May20:21:59:07	4
1	26May20:23:43:05	26May20:23:43:05	0
1	26May20:23:43:05	26May20:23:43:49	44
1	27May20:00:20:01	27May20:00:20:14	13
1	27May20:00:31:22	27May20:00:31:25	3
1	27May20:06:05:52	27May20:06:06:04	12
1	27May20:18:11:33	27May20:18:11:52	19
1	27May20:19:15:36	27May20:19:15:50	14
1	27May20:19:17:09	27May20:19:17:11	2
1	28May20:00:14:34	28May20:00:15:39	65
1	28May20:00:34:06	28May20:00:34:06	0
1	28May20:00:34:06	28May20:00:34:09	3
1	28May20:16:09:20	28May20:16:09:21	1
1	28May20:16:09:21	28May20:16:09:37	16
1	28May20:16:09:37	28May20:16:09:43	6
1	28May20:16:10:37	28May20:16:12:59	142
1	28May20:16:12:59	28May20:16:13:11	12
1	28May20:16:15:08	28May20:16:15:29	21
1	28May20:16:17:39	28May20:16:17:42	3
1	28May20:16:17:42	28May20:16:18:41	59
1	28May20:16:20:01	28May20:16:20:25	24
1	28May20:16:22:00	28May20:16:22:16	16
1	28May20:16:27:12	28May20:16:27:30	18
1	13Jul20:21:30:23	13Jul20:21:30:23	0
1	13Jul20:21:30:23	13Jul20:21:32:13	110
1	13Jul20:22:02:51	13Jul20:22:02:55	4
1	13Jul20:22:02:55	13Jul20:22:02:57	2
1	13Jul20:22:05:15	13Jul20:22:05:21	6
1	13Jul20:22:23:14	13Jul20:22:23:14	0
1	13Jul20:22:23:14	13Jul20:22:23:18	4
1	13Jul20:22:23:24	13Jul20:22:24:52	88
1	13Jul20:22:26:41	13Jul20:22:27:14	33
1	13Jul20:22:35:45	13Jul20:22:50:24	879
1	13Jul20:22:59:23	13Jul20:22:59:30	7
1	13Jul20:22:59:37	13Jul20:22:59:51	14
1	13Jul20:23:00:08	13Jul20:23:00:20	12
1	13Jul20:23:00:25	13Jul20:23:23:37	1392
1	13Jul20:23:23:37	13Jul20:23:34:12	635
1	14Jul20:20:15:06	14Jul20:20:38:29	1403
1	14Jul20:20:38:29	14Jul20:20:43:17	288
1	14Jul20:20:55:03	14Jul20:20:57:49	166
1	14Jul20:21:04:56	14Jul20:21:06:07	71
1	14Jul20:21:06:07	14Jul20:21:06:16	9
1	14Jul20:21:10:07	14Jul20:21:12:29	142
1	14Jul20:22:32:18	14Jul20:22:34:05	107
1	14Jul20:22:40:38	14Jul20:22:40:41	3
1	14Jul20:22:40:41	14Jul20:22:40:48	7
1	14Jul20:22:49:27	14Jul20:22:49:28	1
1	14Jul20:22:49:28	14Jul20:22:49:30	2
1	14Jul20:22:49:39	14Jul20:22:49:49	10
1	14Jul20:23:28:04	14Jul20:23:28:06	2
1	14Jul20:23:28:10	14Jul20:23:32:09	239
1	14Jul20:23:44:08	14Jul20:23:44:35	27
1	14Jul20:23:44:35	14Jul20:23:48:08	213
1	15Jul20:16:46:31	15Jul20:16:46:53	22
1	15Jul20:16:46:53	15Jul20:16:48:53	120
1	15Jul20:16:51:33	15Jul20:16:56:04	271
1	15Jul20:16:56:04	15Jul20:16:57:02	58
1	15Jul20:19:44:30	15Jul20:19:45:32	62
1	15Jul20:19:45:32	15Jul20:19:45:32	0
1	15Jul20:21:31:33	15Jul20:21:31:37	4
1	15Jul20:22:50:46	15Jul20:22:51:41	55
1	15Jul20:22:51:44	15Jul20:22:52:06	22
1	15Jul20:23:25:45	15Jul20:23:27:33	108
1	16Jul20:22:00:07	16Jul20:22:02:11	124
1	16Jul20:22:02:17	16Jul20:22:03:19	62
1	16Jul20:22:03:23	16Jul20:22:03:32	9
1	16Jul20:22:03:35	16Jul20:22:03:38	3
1	16Jul20:22:03:43	16Jul20:22:03:48	5
1	16Jul20:22:03:53	16Jul20:22:33:24	1771
1	16Jul20:22:33:24	16Jul20:22:38:59	335
1	16Jul20:22:38:59	16Jul20:22:39:30	31
1	16Jul20:22:39:35	16Jul20:22:40:38	63
1	16Jul20:23:39:08	16Jul20:23:39:44	36
1	16Jul20:23:40:24	16Jul20:23:47:26	422
1	16Jul20:23:47:26	16Jul20:23:58:03	637
1	17Jul20:23:20:34	17Jul20:23:31:14	640
1	17Jul20:23:31:14	17Jul20:23:31:19	5
1	17Jul20:23:31:25	17Jul20:23:34:11	166
1	17Jul20:23:34:15	17Jul20:23:34:33	18
1	17Jul20:23:34:33	17Jul20:23:34:34	1
1	17Jul20:23:34:34	17Jul20:23:35:01	27
1	17Jul20:23:35:04	17Jul20:23:41:41	397
1	17Jul20:23:41:41	17Jul20:23:42:23	42
1	17Jul20:23:42:42	17Jul20:23:42:43	1
1	17Jul20:23:42:43	17Jul20:23:46:38	235
1	17Jul20:23:46:38	17Jul20:23:46:41	3
1	17Jul20:23:46:41	17Jul20:23:47:08	27
1	17Jul20:23:47:08	17Jul20:23:57:19	611
1	17Jul20:23:57:19	17Jul20:23:57:20	1
1	17Jul20:23:58:24	18Jul20:00:47:45	2961
1	18Jul20:19:09:16	18Jul20:19:13:38	262
1	18Jul20:19:48:19	18Jul20:19:52:05	226
1	18Jul20:19:52:05	18Jul20:19:54:23	138
1	18Jul20:19:52:36	18Jul20:19:52:36	0
1	18Jul20:19:52:36	18Jul20:20:04:02	686
1	18Jul20:20:04:02	18Jul20:20:20:53	1011
1	18Jul20:20:20:53	18Jul20:20:34:00	787
1	18Jul20:22:29:25	18Jul20:22:30:00	35
1	18Jul20:22:30:00	18Jul20:22:30:06	6
1	18Jul20:23:12:58	18Jul20:23:12:59	1
1	18Jul20:23:12:59	18Jul20:23:13:04	5
1	18Jul20:23:13:11	18Jul20:23:15:27	136
1	18Jul20:23:15:27	18Jul20:23:15:46	19
1	18Jul20:23:15:59	18Jul20:23:16:08	9
1	18Jul20:23:16:08	18Jul20:23:22:01	353
1	18Jul20:23:22:01	18Jul20:23:23:23	82
1	18Jul20:23:23:23	18Jul20:23:25:24	121
1	18Jul20:23:40:30	18Jul20:23:40:47	17
1	18Jul20:23:40:47	18Jul20:23:48:52	485
1	19Jul20:18:12:03	19Jul20:18:12:03	0
1	19Jul20:18:12:09	19Jul20:18:12:13	4
1	19Jul20:18:12:24	19Jul20:18:13:15	51
1	19Jul20:19:16:24	19Jul20:19:16:34	10
1	19Jul20:19:16:34	19Jul20:19:16:34	0
1	19Jul20:19:16:37	19Jul20:19:16:39	2
1	19Jul20:19:16:41	19Jul20:19:16:41	0
1	19Jul20:19:16:41	19Jul20:19:16:58	17
1	19Jul20:21:19:30	19Jul20:21:19:30	0
1	19Jul20:21:19:30	19Jul20:21:19:32	2
1	19Jul20:21:19:38	19Jul20:21:20:52	74
1	19Jul20:22:15:25	19Jul20:22:40:01	1476
1	19Jul20:23:22:05	19Jul20:23:22:32	27
1	19Jul20:23:22:32	19Jul20:23:22:54	22
2	10Jun20:20:29:43	10Jun20:20:32:50	187
2	10Jun20:20:32:50	10Jun20:20:33:05	15
2	10Jun20:20:33:05	10Jun20:20:35:48	163
2	10Jun20:20:38:00	10Jun20:20:40:48	168
2	10Jun20:20:40:48	10Jun20:20:40:51	3
2	10Jun20:20:40:51	10Jun20:20:40:56	5
2	10Jun20:20:41:09	10Jun20:20:43:24	135
2	10Jun20:20:43:24	10Jun20:20:44:12	48
2	10Jun20:20:44:15	10Jun20:20:52:23	488
2	10Jun20:20:52:23	10Jun20:20:55:07	164
2	10Jun20:20:55:07	10Jun20:21:06:00	653
2	10Jun20:21:29:41	10Jun20:21:50:23	1242
2	10Jun20:21:50:23	10Jun20:21:50:24	1
2	10Jun20:21:50:30	10Jun20:21:51:28	58
2	10Jun20:21:51:28	10Jun20:21:53:40	132
2	11Jun20:14:41:40	11Jun20:14:42:40	1
2	11Jun20:23:31:08	11Jun20:23:34:25	197
2	11Jun20:23:34:25	11Jun20:23:36:01	96
2	11Jun20:23:36:01	11Jun20:23:36:03	2
2	11Jun20:23:36:05	11Jun20:23:39:43	218
2	11Jun20:23:39:43	11Jun20:23:41:51	128
2	11Jun20:23:41:51	11Jun20:23:42:15	24
2	11Jun20:23:42:23	11Jun20:23:44:44	141
2	11Jun20:23:44:44	11Jun20:23:47:03	139
2	11Jun20:23:47:03	11Jun20:23:47:04	1
2	11Jun20:23:48:00	11Jun20:23:48:28	28
2	11Jun20:23:48:28	11Jun20:23:48:55	27
2	11Jun20:23:48:55	11Jun20:23:53:05	250
2	11Jun20:23:53:05	11Jun20:23:58:36	331
2	11Jun20:23:58:36	12Jun20:00:04:12	336
2	12Jun20:21:52:43	12Jun20:21:58:06	323
2	12Jun20:21:58:06	12Jun20:22:00:48	162
2	12Jun20:22:00:48	12Jun20:22:01:11	23
2	12Jun20:22:01:11	12Jun20:22:03:08	117
2	12Jun20:22:04:51	12Jun20:22:12:45	474
2	12Jun20:22:12:45	12Jun20:22:13:38	53
2	12Jun20:22:13:59	12Jun20:22:14:11	12
2	12Jun20:22:14:11	12Jun20:22:23:22	551
2	12Jun20:22:23:22	12Jun20:22:23:59	37
2	12Jun20:22:23:59	12Jun20:22:24:28	29
2	12Jun20:22:24:28	12Jun20:22:25:41	73
2	12Jun20:22:25:41	12Jun20:22:56:15	1834
2	12Jun20:22:56:15	12Jun20:22:56:23	8
2	12Jun20:22:56:23	12Jun20:23:27:15	1852
2	12Jun20:23:27:15	12Jun20:23:29:08	113
2	12Jun20:23:29:08	12Jun20:23:44:16	908
2	12Jun20:23:44:16	12Jun20:23:45:58	102
2	12Jun20:23:46:55	13Jun20:00:20:55	2040
2	13Jun20:21:59:55	13Jun20:21:59:55	0
2	13Jun20:22:00:00	13Jun20:22:00:01	1
2	13Jun20:22:00:01	13Jun20:22:00:09	8
2	13Jun20:22:00:11	13Jun20:22:00:14	3
2	13Jun20:22:00:14	13Jun20:22:00:26	12
2	13Jun20:22:00:35	13Jun20:22:00:36	1
2	13Jun20:22:00:36	13Jun20:22:00:51	15
2	13Jun20:22:00:51	13Jun20:22:00:56	5
2	13Jun20:22:00:56	13Jun20:22:00:57	1
2	13Jun20:22:00:57	13Jun20:22:13:35	758
2	13Jun20:22:13:35	13Jun20:22:13:38	3
2	13Jun20:22:13:38	13Jun20:22:13:49	11
2	13Jun20:22:13:49	13Jun20:22:15:42	113
2	13Jun20:22:15:42	13Jun20:22:15:54	12
2	13Jun20:22:15:54	13Jun20:22:34:14	1100
2	13Jun20:22:34:33	13Jun20:22:38:58	265
2	13Jun20:22:39:37	13Jun20:22:39:44	7
2	13Jun20:22:40:26	13Jun20:22:54:30	844
2	13Jun20:22:54:30	13Jun20:22:54:34	4
2	13Jun20:22:54:34	13Jun20:23:34:57	2423
2	14Jun20:00:04:17	14Jun20:00:08:24	247
2	14Jun20:21:10:26	14Jun20:21:11:37	71
2	14Jun20:21:11:48	14Jun20:21:15:17	209
2	14Jun20:21:15:17	14Jun20:21:33:48	1111
2	14Jun20:21:33:48	14Jun20:21:45:01	673
2	14Jun20:21:45:01	14Jun20:21:45:44	43
2	14Jun20:21:45:44	14Jun20:21:47:04	80
2	14Jun20:21:47:04	14Jun20:22:08:04	1260
2	14Jun20:22:08:04	14Jun20:22:10:27	143
2	14Jun20:22:10:27	14Jun20:22:23:59	812
2	14Jun20:22:23:59	14Jun20:22:39:50	951
2	14Jun20:22:42:20	14Jun20:22:59:23	1023
2	14Jun20:23:00:06	14Jun20:23:34:47	2081
2	14Jun20:23:34:50	14Jun20:23:49:57	907
2	14Jun20:23:49:57	14Jun20:23:59:16	559
2	14Jun20:23:59:16	15Jun20:00:18:12	1136
2	15Jun20:23:03:40	15Jun20:23:03:59	19
2	15Jun20:23:03:59	15Jun20:23:04:14	15
2	15Jun20:23:04:14	15Jun20:23:09:14	300
2	15Jun20:23:09:14	15Jun20:23:10:21	67
2	15Jun20:23:11:21	15Jun20:23:14:50	209
2	15Jun20:23:14:50	15Jun20:23:18:04	194
2	15Jun20:23:18:04	15Jun20:23:18:11	7
2	15Jun20:23:18:11	15Jun20:23:23:00	289
2	15Jun20:23:23:00	15Jun20:23:32:54	594
2	15Jun20:23:32:54	15Jun20:23:33:24	30
2	15Jun20:23:33:24	15Jun20:23:35:49	145
2	15Jun20:23:35:49	15Jun20:23:37:27	98
2	15Jun20:23:38:34	15Jun20:23:40:16	102
2	15Jun20:23:40:16	15Jun20:23:44:19	243
2	15Jun20:23:44:19	15Jun20:23:44:23	4
2	15Jun20:23:54:59	15Jun20:23:55:03	4
2	15Jun20:23:55:32	16Jun20:00:13:29	1077
Run;

I want to calculate the number of days the program was installed per person. Normally I would do proc means by Ppt_ID, and calculate days_installed as maximum End datetime per person - minimum Start datetime per person. However, there are times when the person uninstalled the program and re-installed the program several days later (as can be seen in the gap in datetimes between rows 33 and 34). The calculation I proposed would not account for the large gaps where the program was not in use. What I would like to do is the following:

 

1) Where a person has continuous program usage (e.g., Ppt_ID #2), meaning that each Start datetime starts <1  day from the previous row's End datetime, calculate the total days of program installation (per person) using the method I described above: last End datetime - first Start datetime for that person.

2) Where a person has >=1 days between an End datetime and the next row's Start datetime (e.g., Ppt_ID #1), minus that gap in days from the total amount of days installed. In other words, per person, add the time for each set of rows that are <1 day from each other, and then add the amount of time for the next set of days that are <1 day from each other.

 

An example would be for Ppt_ID #1, the first "chunk" of time would be 26MAY20:17:00:47 (Start, row 1) to 28MAY20:16:27:30 (End, row 33); and the second "chunk" of time would be 13JUL20:21:30:23 (Start, row 34) to 19JUL20:23:22:54 (End, row 134). The total amount of time for this person would be ((28MAY20:16:27:30 - 26MAY20:17:00:47) + (19JUL20:23:22:54 - 13JUL20:21:30:23)) / 86400, or 8.055023148 days for Ppt_ID #1. For ID 2, there are no gaps >=1 day, so the number of days would just be (16JUN20:00:13:29 (End, row 235) - 10JUN20:20:29:43 (Start, row 135)) / 86400, or 5.155393519 days for Ppt_ID #2.

 

Could someone assist? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Amir
PROC Star

Hi,

 

Thanks for supplying the data in a data step and the detailed explanation, a good example to other questioners.

 

The following might get you started, it appears to give the results you indicated, but might need refinement, e.g., if there is only one observation for a particular ppt_id then the code will need to be tweaked to handle that.

 

It's late where I am, so all the best!

 

 

 

data want(keep = ppt_id days_installed);
  /* process each ppt_id in a loop */
  do until (last.ppt_id);
    set duration;
    by ppt_id;
    
    if first.ppt_id then
      chunk_start = start;
    else
    do;
      /* if greater than a day since last use then record days installed so far */
      if (start - last_used) ge 86400 then
      do;
        days_installed = sum(0, days_installed, last_used - chunk_start);
        
        /* if this is the last row for this ppt_id then add it to the days_installed */
        if last.ppt_id then
          days_installed = sum(0, days_installed, end - start);
        else
          chunk_start = start;
      end;
      else
        if last.ppt_id then
          days_installed = sum(0, days_installed, end - chunk_start);
    end;
  
    last_used = end;
  end; /* do until */

  days_installed = days_installed / 86400;
run;

 

 

 

 

 

Thanks & kind regards,

Amir.

View solution in original post

6 REPLIES 6
confooseddesi89
Quartz | Level 8

Not sure why that happened - should be fixed now.

confooseddesi89
Quartz | Level 8

Bump - any help is appreciated given I'm presenting a poster on this in <2 weeks!

Amir
PROC Star

Hi,

 

Thanks for supplying the data in a data step and the detailed explanation, a good example to other questioners.

 

The following might get you started, it appears to give the results you indicated, but might need refinement, e.g., if there is only one observation for a particular ppt_id then the code will need to be tweaked to handle that.

 

It's late where I am, so all the best!

 

 

 

data want(keep = ppt_id days_installed);
  /* process each ppt_id in a loop */
  do until (last.ppt_id);
    set duration;
    by ppt_id;
    
    if first.ppt_id then
      chunk_start = start;
    else
    do;
      /* if greater than a day since last use then record days installed so far */
      if (start - last_used) ge 86400 then
      do;
        days_installed = sum(0, days_installed, last_used - chunk_start);
        
        /* if this is the last row for this ppt_id then add it to the days_installed */
        if last.ppt_id then
          days_installed = sum(0, days_installed, end - start);
        else
          chunk_start = start;
      end;
      else
        if last.ppt_id then
          days_installed = sum(0, days_installed, end - chunk_start);
    end;
  
    last_used = end;
  end; /* do until */

  days_installed = days_installed / 86400;
run;

 

 

 

 

 

Thanks & kind regards,

Amir.

Amir
PROC Star

Hi,

 

I have revised the code to handle a single observation for a ppt_id, as well as some other tweaks:

 

 

data want2(keep = ppt_id days_installed);
  /* process each ppt_id in a loop */
  do until (last.ppt_id);
    set duration;
    by ppt_id;
    
    if first.ppt_id then
      /* check if there is only one obs for this ppt_id */
      if last.ppt_id then
        days_installed = end - start;
      else
        chunk_start = start;
    else
    do;
      /* if greater than a day since last use then record days installed so far */
      if (start - last_used) ge 86400 then
      do;
        days_installed = sum(0, days_installed, last_used - chunk_start);
        chunk_start    = start;
      end;
    end;
  
    /* if this is the last obs for this multi-obs ppt_id then add it to the days_installed */
    if last.ppt_id and not first.ppt_id then
      days_installed = sum(0, days_installed, end - chunk_start);
    else
      last_used = end;
  end; /* do until */

  days_installed = days_installed / 86400;
run;

 

 

 

To help test I added the following to the datalines - I didn't bother changing duration_s as it is not used in the calculation:

 

 

3 15Jun20:23:55:32  16Jun20:23:55:32  1077
4 15Jun20:23:55:32  16Jun20:23:55:32  1077
4 16Jun20:23:55:32  17Jun20:23:55:32  1077
5 15Jun20:23:55:32  16Jun20:23:55:32  1077
5 18Jun20:23:55:33  19Jun20:23:55:33  1077
6 15Jun20:00:00:00  15Jun20:12:00:00  1077
6 15Jun20:12:00:00  16Jun20:00:00:00  1077

 

 

 

The results are:

 

Amir_1-1694270082433.png

 

 

Thanks  & kind regards,

Amir.

 

 

Tom
Super User Tom
Super User

Sounds like you want to break the records into periods by checking for gaps between START and the previous END.

 

Not clear what your criteria is for declaring a gap too large.  For example you might use 24 hours.

data periods ;
  set duration;
  by ppt_id;
  gap = start - lag(end);
  if first.ppt_id then do ; gap=0; period=1; end;
  else if gap > '24:00't then period+1;
  format gap time12.;
  duration_h=duration_s/'01:00't ;
run;

proc summary data=periods ;
  by ppt_id period;
  output out=summary(drop=_type_) sum(duration_:)= min(start)= max(end)= ;
run;

data summary;
  set summary;
  days = 1+datepart(end)-datepart(start);
  hours_day = duration_h/days;
run;

Result

Tom_0-1694281035707.png

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 484 views
  • 1 like
  • 4 in conversation