- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have two variables id and dif. I need to create sum of variable dif by id until the value of dif is 1. If dif>1 then counting will restart from there and continue until dif>1 or id is different.
id | dif | Sum |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 1 | 7 |
1 | 2 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
2 | 1 | 3 |
2 | 1 | 3 |
2 | 1 | 3 |
2 | 3 | 3 |
2 | 1 | 3 |
2 | 1 | 3 |
2 | 2 | 2 |
2 | 1 | 2 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I need to create sum of variable dif by id until the value of dif is 1
Should that really say "... until the value of dif is not equal to 1" ??
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No, it should say dif>1 because in some situations the dif can be 0.
Secondly, the sum variable should take to sum of rows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why is the value of variable SUM of the second group equal to 6? It seems to me that the sum of variable DIF in the second group is 7.
2 + 1 + 1 + 1+ 1 + 1 = 7
Similar question applies to later groups as well
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry- I clarified it in my previous reply. The sum should be total number of rows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This code is untested. If you want tested code, please provide the data as working SAS data step code (examples and instructions).
/* UNTESTED CODE */
data intermediate;
set have;
by id;
if first.id or dif>1 then group+1;
run;
proc summary nway data=intermediate;
class group;
var dif;
output out=counts n=sum;
run;
data want;
merge intermediate counts(keep=group sum);
by id;
run;
This assumes the original data set is sorted by ID.
Suggestion: if the value you want is a count and not a sum, do not name the variable SUM
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You seem to be making GROUPS, but you don't actually create a variable to indicate the group number.
data want;
set have;
by id;
if first.id then group=1;
else if dif > 1 then group+1;
run;
Once you have the group then counting how may observations are in each group is easy.
proc sql;
create table want2 as
select *, count(*) as count
from want
group by id,group
;
quit;
Result
Obs id dif Sum group count 1 1 1 7 1 7 2 1 1 7 1 7 3 1 1 7 1 7 4 1 1 7 1 7 5 1 1 7 1 7 6 1 1 7 1 7 7 1 1 7 1 7 8 1 1 6 2 6 9 1 1 6 2 6 10 1 1 6 2 6 11 1 1 6 2 6 12 1 1 6 2 6 13 1 2 6 2 6 14 2 1 3 1 3 15 2 1 3 1 3 16 2 1 3 1 3 17 2 1 3 2 3 18 2 1 3 2 3 19 2 3 3 2 3 20 2 1 2 3 2 21 2 2 2 3 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Tom.
I have another condition- if dif =0 or missing then that row should not be counted in the sum.
id | dif | Sum |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 0 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 2 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
1 | 1 | 6 |
2 | 1 | 2 |
2 | . | 2 |
2 | 1 | 2 |
2 | 3 | 3 |
2 | 1 | 3 |
2 | 1 | 3 |
2 | 2 | 2 |
2 | 1 | 2 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just change the counting step. Instead of COUNT(*) you could use SUM( dif not in (0 .) )
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Tom. That gave me required clue. I modified the code to ignore 0 and missing in the dif variable.
data want;
*retain id group dif count;
set have (drop= sum);
by id;
if first.id then gr=1;
else if (dif > 2 & dif ne .) then gr+1;
count=1;
if dif = . or dif = 0 then count=.;
run;
proc sql;
create table want2 as
select *, SUM( count ) as dur
from want
group by id,gr
;
quit;
But one thing I noticed that in the want2 dataset it is changing the order of observations in the dif variable. Number 3 is appearing on the last row in dif. Is there any way to maintain the order as in the want dataset?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Pass through each GROUP twice, once to get the group size, and once to write output with the new COUNT variable. GROUP is defined as a sequence of obs ending with either the last obs for a given ID, or when the next observation has DIF>1:
data have;
input id dif sum;
datalines;
1 1 7
1 1 7
1 1 7
1 1 7
1 1 7
1 1 7
1 1 7
1 2 6
1 1 6
1 1 6
1 1 6
1 1 6
1 1 6
2 1 3
2 1 3
2 1 3
2 3 3
2 1 3
2 1 3
2 2 2
2 1 2
run;
data want (drop=_:);
do count=1 by 1 until (last.id or _next_dif>1);
set have;
by id;
merge have have (firstobs=2 keep=dif rename=(dif=_next_dif));
end;
do _n_=1 to count;
set have;
output;
end;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
How can I exclude counting any rows where the value of dif is either 0 or missing?