Fluorite | Level 6

## Conditional summation and enumeration

Hello,

I am new to SAS (using SAS Enterprise 7.1) and I struggle to solve the following problem:

Example Dataset:

data have;
input ID Diff Duration;
datalines;
1 . 10
1 2 13
1 25 14
1 5 5
1 2 7
1 2 9
2 . 3
2 2 5
2 10 7
3 . 4
;

Task: I would like to sum up Duration and give an enumeration variable for a Sequence but only if IDs are similar and Diff is not . and ❤️ (so if Diff<3 I call it a Sequence). What makes the task difficult is that when comparing to subsequent rows the enumeration variable 'Sequence' should only continue counting if the next row does not belong to the current sequence (so if Diff is not . and <3) whereas the summation 'Cummulative_Duration' should start form the beginning in that case.

So the result should look like the following table:

ID Diff Duration Sequence Cummulative_Duration ;
1 . 10 1 10
1 2 13 1 23
1 25 14 2 14
1 5 5 3 5
1 2 7 3 12
1 2 9 3 21
2 . 3 1 3
2 2 5 1 8
2 10 7 2 7
3 . 4 1 4
;

Thanks in advance for taking your time to help me solving this (I tried already half a day ...)!

1 ACCEPTED SOLUTION

Accepted Solutions
Barite | Level 11

## Re: Conditional summation and enumeration

``````data have;
input ID Diff Duration;
datalines;
1 . 10
1 2 13
1 25 14
1 5 5
1 2 7
1 2 9
2 . 3
2 2 5
2 10 7
3 . 4
;
run;

data want;

set have;
by id;

retain Sequence Cummulative_Duration;

if first.id then do;
Sequence =1;
Cummulative_Duration=Duration;
end;

else if Diff < 3 then do;
Sequence =Sequence;
Cummulative_Duration+Duration;
end;
else do;
Sequence+1;
Cummulative_Duration=Duration;
end;

run;``````
4 REPLIES 4
Barite | Level 11

## Re: Conditional summation and enumeration

``````data have;
input ID Diff Duration;
datalines;
1 . 10
1 2 13
1 25 14
1 5 5
1 2 7
1 2 9
2 . 3
2 2 5
2 10 7
3 . 4
;
run;

data want;

set have;
by id;

retain Sequence Cummulative_Duration;

if first.id then do;
Sequence =1;
Cummulative_Duration=Duration;
end;

else if Diff < 3 then do;
Sequence =Sequence;
Cummulative_Duration+Duration;
end;
else do;
Sequence+1;
Cummulative_Duration=Duration;
end;

run;``````
Tourmaline | Level 20

## Re: Conditional summation and enumeration

``````data have;
input ID Diff Duration;
datalines;
1 . 10
1 2 13
1 25 14
1 5 5
1 2 7
1 2 9
2 . 3
2 2 5
2 10 7
3 . 4
;
run;

data want;
if 0 then set have;
sequence=1;
cumulative=0;
do until(last.id);
set have;
by id;
if diff>=3 then do; sequence+1;cumulative=0;end;
cumulative+Duration;
output;
end;
run;
``````
PROC Star

## Re: Conditional summation and enumeration

x

``````data have;
input ID Diff Duration;
datalines;
1 . 10
1 2 13
1 25 14
1 5 5
1 2 7
1 2 9
2 . 3
2 2 5
2 10 7
3 . 4
;
data want;
set have;
by id;
if diff<=3 then cumulative_duration+duration;
else cumulative_duration=duration;
sequence+(diff>3);
if first.id then do;
sequence=1;
cumulative_duration=duration;
end;
run;``````

1. Notice that the "if diff<=3 then cumulative_duration+duration;" statement doesn't have an equal sign, typically needed to assign a value.   This means the expression "cumulative_duration+duration" is a summing expression, not an assignment.  The summing expression means (1) perform the summation, (2) using a RETAINED value of cumulative_duration, and (3) if cumulative_duration is missing, treat it as a zero.
2. The "sequence+(diff>3)" is a summing statement, adding 1 to sequence whenever diff>3).
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Fluorite | Level 6

## Re: Conditional summation and enumeration

Thanks for the explanations! Probably that was where my trials failed ...
Discussion stats
• 4 replies
• 926 views
• 0 likes
• 4 in conversation