I have two variables id and dif. I need to create sum of variable dif by id until the value of dif is 1. If dif>1 then counting will restart from there and continue until dif>1 or id is different.
id | dif | Sum |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 2 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
2 | 1 | 3 |
2 | 1 | 3 |
2 | 1 | 3 |
2 | 3 | 3 |
2 | 1 | 3 |
2 | 1 | 3 |
2 | 2 | 2 |
2 | 1 | 2 |
I need to create sum of variable dif by id until the value of dif is 1
Should that really say "... until the value of dif is not equal to 1" ??
No, it should say dif>1 because in some situations the dif can be 0.
Secondly, the sum variable should take to sum of rows.
Why is the value of variable SUM of the second group equal to 6? It seems to me that the sum of variable DIF in the second group is 7.
2 + 1 + 1 + 1+ 1 + 1 = 7
Similar question applies to later groups as well
Sorry- I clarified it in my previous reply. The sum should be total number of rows.
This code is untested. If you want tested code, please provide the data as working SAS data step code (examples and instructions).
/* UNTESTED CODE */
data intermediate;
set have;
by id;
if first.id or dif>1 then group+1;
run;
proc summary nway data=intermediate;
class group;
var dif;
output out=counts n=sum;
run;
data want;
merge intermediate counts(keep=group sum);
by id;
run;
This assumes the original data set is sorted by ID.
Suggestion: if the value you want is a count and not a sum, do not name the variable SUM
You seem to be making GROUPS, but you don't actually create a variable to indicate the group number.
data want;
set have;
by id;
if first.id then group=1;
else if dif > 1 then group+1;
run;
Once you have the group then counting how may observations are in each group is easy.
proc sql;
create table want2 as
select *, count(*) as count
from want
group by id,group
;
quit;
Result
Obs id dif Sum group count 1 1 1 7 1 7 2 1 1 7 1 7 3 1 1 7 1 7 4 1 1 7 1 7 5 1 1 7 1 7 6 1 1 7 1 7 7 1 1 7 1 7 8 1 1 6 2 6 9 1 1 6 2 6 10 1 1 6 2 6 11 1 1 6 2 6 12 1 1 6 2 6 13 1 2 6 2 6 14 2 1 3 1 3 15 2 1 3 1 3 16 2 1 3 1 3 17 2 1 3 2 3 18 2 1 3 2 3 19 2 3 3 2 3 20 2 1 2 3 2 21 2 2 2 3 2
Thanks Tom.
I have another condition- if dif =0 or missing then that row should not be counted in the sum.
id | dif | Sum |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 0 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 2 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
2 | 1 | 2 |
2 | . | 2 |
2 | 1 | 2 |
2 | 3 | 3 |
2 | 1 | 3 |
2 | 1 | 3 |
2 | 2 | 2 |
2 | 1 | 2 |
Just change the counting step. Instead of COUNT(*) you could use SUM( dif not in (0 .) )
Thanks Tom. That gave me required clue. I modified the code to ignore 0 and missing in the dif variable.
data want;
*retain id group dif count;
set have (drop= sum);
by id;
if first.id then gr=1;
else if (dif > 2 & dif ne .) then gr+1;
count=1;
if dif = . or dif = 0 then count=.;
run;
proc sql;
create table want2 as
select *, SUM( count ) as dur
from want
group by id,gr
;
quit;
But one thing I noticed that in the want2 dataset it is changing the order of observations in the dif variable. Number 3 is appearing on the last row in dif. Is there any way to maintain the order as in the want dataset?
Pass through each GROUP twice, once to get the group size, and once to write output with the new COUNT variable. GROUP is defined as a sequence of obs ending with either the last obs for a given ID, or when the next observation has DIF>1:
data have;
input id dif sum;
datalines;
1 1 7
1 1 7
1 1 7
1 1 7
1 1 7
1 1 7
1 1 7
1 2 6
1 1 6
1 1 6
1 1 6
1 1 6
1 1 6
2 1 3
2 1 3
2 1 3
2 3 3
2 1 3
2 1 3
2 2 2
2 1 2
run;
data want (drop=_:);
do count=1 by 1 until (last.id or _next_dif>1);
set have;
by id;
merge have have (firstobs=2 keep=dif rename=(dif=_next_dif));
end;
do _n_=1 to count;
set have;
output;
end;
run;
Thanks.
How can I exclude counting any rows where the value of dif is either 0 or missing?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.