Hi all,
I have a data like below. I am trying to count unique days for each subject. For example, for ID 101 answer should be 3, 10, 11, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27. So total days 13. Can you please help me with this logic. I have large data so I am working from morning and now I am exhausted my brain is not working. Any help would be appreciated. Thanks in advance!
data have;
input id $1-10 stdy $11-18 endy $19-26;
datalines;
101 3 3
101 10 10
101 11 11
101 18 27
101 19 19
101 19 19
101 19 19
101 20 20
104 24 24
104 43 43
104 47 47
104 54 54
104 94 94
104 108 108
104 115 115
104 134 134
104 144 144
104 176 176
104 187 187
104 210 210
104 212 212
104 213 213
104 214 214
104 274 274
104 276 276
105 17 17
105 17 17
105 46 46
105 46 46
105 52 52
105 73 73
105 93 93
105 109 109
105 252 252
;
run;
Since for one observations you normally just need to subtract.
num_days = end - start + 1;
It is only if you want to aggregate of multiple observations where you need to worry about counting some days more than once.
In that case it can be very simple to just use a simple temporary array that uses the actual date as the index. Then on each observation write 1 into the array for each day covered by that day. Then at the end add them up.
But you need to have NUMERIC day values, not character strings.
data have;
input id $ stdy endy ;
datalines;
101 3 3
101 10 10
101 11 11
101 18 27
101 19 19
101 19 19
101 19 19
101 20 20
104 24 24
104 43 43
104 47 47
104 54 54
104 94 94
104 108 108
104 115 115
104 134 134
104 144 144
104 176 176
104 187 187
104 210 210
104 212 212
104 213 213
104 214 214
104 274 274
104 276 276
105 17 17
105 17 17
105 46 46
105 46 46
105 52 52
105 73 73
105 93 93
105 109 109
105 252 252
;
data want;
set have;
by id;
array days [0:500] _temporary_;
if first.id then call missing(of days[*]);
do day=stdy to endy; days[day]=1; end;
if last.id ;
num_days = sum(0,of days[*]) ;
drop day stdy endy;
run;
proc print;
run;
Results
Obs id num_days 1 101 13 2 104 17 3 105 7
It does not matter if the array is defined too large (within reason).
If you have actual dates then you just need to be a little more creative in setting the upper and lower indexes when defining the array. They have to be actual integer values. But you can use some macro code to make it easy enough to use the integers that SAS uses for dates. So if you know you dates will always be between 01JAN2000 and 01JAN2030 you could use code like this to define the array.
So you might use %SYSFUNC() to call the MDY() function.
array days [%sysfunc(mdy(1,1,2000)) : %sysfunc(mdy(1,1,2030))] _temporary_;
Or perhaps just the %SYSEVALF() so the macro processor will recognize date literals.
array days [%sysevalf("01JAN2000"d) : %sysevalf("01JAN2030"d)] _temporary_;
Since for one observations you normally just need to subtract.
num_days = end - start + 1;
It is only if you want to aggregate of multiple observations where you need to worry about counting some days more than once.
In that case it can be very simple to just use a simple temporary array that uses the actual date as the index. Then on each observation write 1 into the array for each day covered by that day. Then at the end add them up.
But you need to have NUMERIC day values, not character strings.
data have;
input id $ stdy endy ;
datalines;
101 3 3
101 10 10
101 11 11
101 18 27
101 19 19
101 19 19
101 19 19
101 20 20
104 24 24
104 43 43
104 47 47
104 54 54
104 94 94
104 108 108
104 115 115
104 134 134
104 144 144
104 176 176
104 187 187
104 210 210
104 212 212
104 213 213
104 214 214
104 274 274
104 276 276
105 17 17
105 17 17
105 46 46
105 46 46
105 52 52
105 73 73
105 93 93
105 109 109
105 252 252
;
data want;
set have;
by id;
array days [0:500] _temporary_;
if first.id then call missing(of days[*]);
do day=stdy to endy; days[day]=1; end;
if last.id ;
num_days = sum(0,of days[*]) ;
drop day stdy endy;
run;
proc print;
run;
Results
Obs id num_days 1 101 13 2 104 17 3 105 7
It does not matter if the array is defined too large (within reason).
If you have actual dates then you just need to be a little more creative in setting the upper and lower indexes when defining the array. They have to be actual integer values. But you can use some macro code to make it easy enough to use the integers that SAS uses for dates. So if you know you dates will always be between 01JAN2000 and 01JAN2030 you could use code like this to define the array.
So you might use %SYSFUNC() to call the MDY() function.
array days [%sysfunc(mdy(1,1,2000)) : %sysfunc(mdy(1,1,2030))] _temporary_;
Or perhaps just the %SYSEVALF() so the macro processor will recognize date literals.
array days [%sysevalf("01JAN2000"d) : %sysevalf("01JAN2030"d)] _temporary_;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.