Pyrite | Level 9

## Count unique days from start day and end day for each subject from multiple records

Hi all,

I have a data like below. I am trying to count unique days for each subject. For example, for ID 101 answer should be 3, 10, 11, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27. So total days 13. Can you please help me with this logic. I have large data so I am working from morning and now I am exhausted my brain is not working. Any help would be appreciated. Thanks in advance!

``````data have;
input id \$1-10 stdy \$11-18 endy \$19-26;
datalines;
101        3        3
101        10        10
101        11        11
101        18        27
101        19        19
101        19        19
101        19        19
101        20        20
104        24        24
104        43        43
104        47        47
104        54        54
104        94        94
104        108        108
104        115        115
104        134        134
104        144        144
104        176        176
104        187        187
104        210        210
104        212        212
104        213        213
104        214        214
104        274        274
104        276        276
105        17        17
105        17        17
105        46        46
105        46        46
105        52        52
105        73        73
105        93        93
105        109        109
105        252        252
;
run;``````

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Count unique days from start day and end day for each subject from multiple records

Since for one observations you normally just need to subtract.

``num_days = end - start + 1;``

It is only if you want to aggregate of multiple observations where you need to worry about counting some days more than once.

In that case it can be very simple to just use a simple temporary array that uses the actual date as the index.  Then on each observation write 1 into the array for each day covered by that day.  Then at the end add them up.

But you need to have NUMERIC day values, not character strings.

``````data have;
input id \$ stdy endy ;
datalines;
101    3      3
101   10     10
101   11     11
101   18     27
101   19     19
101   19     19
101   19     19
101   20     20
104   24     24
104   43     43
104   47     47
104   54     54
104   94     94
104  108    108
104  115    115
104  134    134
104  144    144
104  176    176
104  187    187
104  210    210
104  212    212
104  213    213
104  214    214
104  274    274
104  276    276
105   17     17
105   17     17
105   46     46
105   46     46
105   52     52
105   73     73
105   93     93
105  109    109
105  252    252
;

data want;
set have;
by id;
array days [0:500] _temporary_;
if first.id then call missing(of days[*]);
do day=stdy to endy; days[day]=1; end;
if last.id ;
num_days = sum(0,of days[*]) ;
drop day stdy endy;
run;

proc print;
run;``````

Results

```Obs    id     num_days

1     101       13
2     104       17
3     105        7
```

It does not matter if the array is defined too large (within reason).

If you have actual dates then you just need to be a little more creative in setting the upper and lower indexes when defining the array.  They have to be actual integer values.  But you can use some macro code to make it easy enough to use the integers that SAS uses for dates.  So if you know you dates will always be between 01JAN2000 and 01JAN2030 you could use code like this to define the array.

So you might use %SYSFUNC() to call the MDY() function.

``````  array days [%sysfunc(mdy(1,1,2000)) : %sysfunc(mdy(1,1,2030))] _temporary_;
``````

Or perhaps just the %SYSEVALF() so the macro processor will recognize date literals.

`` array days [%sysevalf("01JAN2000"d) : %sysevalf("01JAN2030"d)] _temporary_;``
Super User

## Re: Count unique days from start day and end day for each subject from multiple records

Since for one observations you normally just need to subtract.

``num_days = end - start + 1;``

It is only if you want to aggregate of multiple observations where you need to worry about counting some days more than once.

In that case it can be very simple to just use a simple temporary array that uses the actual date as the index.  Then on each observation write 1 into the array for each day covered by that day.  Then at the end add them up.

But you need to have NUMERIC day values, not character strings.

``````data have;
input id \$ stdy endy ;
datalines;
101    3      3
101   10     10
101   11     11
101   18     27
101   19     19
101   19     19
101   19     19
101   20     20
104   24     24
104   43     43
104   47     47
104   54     54
104   94     94
104  108    108
104  115    115
104  134    134
104  144    144
104  176    176
104  187    187
104  210    210
104  212    212
104  213    213
104  214    214
104  274    274
104  276    276
105   17     17
105   17     17
105   46     46
105   46     46
105   52     52
105   73     73
105   93     93
105  109    109
105  252    252
;

data want;
set have;
by id;
array days [0:500] _temporary_;
if first.id then call missing(of days[*]);
do day=stdy to endy; days[day]=1; end;
if last.id ;
num_days = sum(0,of days[*]) ;
drop day stdy endy;
run;

proc print;
run;``````

Results

```Obs    id     num_days

1     101       13
2     104       17
3     105        7
```

It does not matter if the array is defined too large (within reason).

If you have actual dates then you just need to be a little more creative in setting the upper and lower indexes when defining the array.  They have to be actual integer values.  But you can use some macro code to make it easy enough to use the integers that SAS uses for dates.  So if you know you dates will always be between 01JAN2000 and 01JAN2030 you could use code like this to define the array.

So you might use %SYSFUNC() to call the MDY() function.

``````  array days [%sysfunc(mdy(1,1,2000)) : %sysfunc(mdy(1,1,2030))] _temporary_;
``````

Or perhaps just the %SYSEVALF() so the macro processor will recognize date literals.

`` array days [%sysevalf("01JAN2000"d) : %sysevalf("01JAN2030"d)] _temporary_;``
Discussion stats