Obsidian | Level 7

Generating id

I have two variables id and dif. I need to create sum of variable dif by id until the value of dif is 1. If dif>1 then counting will restart from there and continue until dif>1 or id is different.

 id dif Sum 1 1 7 1 1 7 1 1 7 1 1 7 1 1 7 1 1 7 1 1 7 1 2 6 1 1 6 1 1 6 1 1 6 1 1 6 1 1 6 2 1 3 2 1 3 2 1 3 2 3 3 2 1 3 2 1 3 2 2 2 2 1 2

11 REPLIES 11
Diamond | Level 26

Re: Generating id

I need to create sum of variable dif by id until the value of dif is 1

Should that really say "... until the value of dif is not equal to 1" ??

--
Paige Miller
Obsidian | Level 7

Re: Generating id

No, it should say dif>1 because in some situations the dif can be 0.
Secondly, the sum variable should take to sum of rows.

Diamond | Level 26

Re: Generating id

Why is the value of variable SUM of the second group equal to 6? It seems to me that the sum of variable DIF in the second group is 7.

2 + 1 + 1 + 1+ 1 + 1 = 7

Similar question applies to later groups as well

--
Paige Miller
Obsidian | Level 7

Re: Generating id

Sorry- I clarified it in my previous reply. The sum should be total number of rows.

Diamond | Level 26

Re: Generating id

This code is untested. If you want tested code, please provide the data as working SAS data step code (examples and instructions).

``````/* UNTESTED CODE */
data intermediate;
set have;
by id;
if first.id or dif>1 then group+1;
run;
proc summary nway data=intermediate;
class group;
var dif;
output out=counts n=sum;
run;
data want;
merge intermediate counts(keep=group sum);
by id;
run;``````

This assumes the original data set is sorted by ID.

Suggestion: if the value you want is a count and not a sum, do not name the variable SUM

--
Paige Miller
Super User

Re: Generating id

You seem to be making GROUPS, but you don't actually create a variable to indicate the group number.

``````data want;
set have;
by id;
if first.id then group=1;
else if dif > 1 then group+1;
run;``````

Once you have the group then counting how may observations are in each group is easy.

``````proc sql;
create table want2 as
select *, count(*) as count
from want
group by id,group
;
quit;``````

Result

```Obs    id    dif    Sum    group    count

1     1     1      7       1        7
2     1     1      7       1        7
3     1     1      7       1        7
4     1     1      7       1        7
5     1     1      7       1        7
6     1     1      7       1        7
7     1     1      7       1        7
8     1     1      6       2        6
9     1     1      6       2        6
10     1     1      6       2        6
11     1     1      6       2        6
12     1     1      6       2        6
13     1     2      6       2        6
14     2     1      3       1        3
15     2     1      3       1        3
16     2     1      3       1        3
17     2     1      3       2        3
18     2     1      3       2        3
19     2     3      3       2        3
20     2     1      2       3        2
21     2     2      2       3        2
```
Obsidian | Level 7

Re: Generating id

Thanks Tom.

I have another condition- if dif =0 or missing then that row should not be counted in the sum.

 id dif Sum 1 1 6 1 1 6 1 0 6 1 1 6 1 1 6 1 1 6 1 1 6 1 2 6 1 1 6 1 1 6 1 1 6 1 1 6 1 1 6 2 1 2 2 . 2 2 1 2 2 3 3 2 1 3 2 1 3 2 2 2 2 1 2
Super User

Re: Generating id

Just change the counting step.  Instead of COUNT(*) you could use SUM( dif not in (0 .) )

Obsidian | Level 7

Re: Generating id

Thanks Tom. That gave me required clue. I modified the code to ignore 0 and missing in the dif variable.

``````data want;
*retain id group dif count;
set have (drop= sum);
by id;
if first.id then gr=1;
else if (dif > 2 & dif ne .) then gr+1;
count=1;
if dif = . or dif = 0 then count=.;
run;

proc sql;
create table want2 as
select *, SUM( count ) as dur
from want
group by id,gr
;
quit;``````

But one thing I noticed that in the want2 dataset it is changing the order of observations in the dif variable. Number 3 is appearing on the last row in dif. Is there any way to maintain the order as in the want dataset?

PROC Star

Re: Generating id

Pass through each GROUP twice, once to get the group size, and once to write output with the new COUNT variable.  GROUP is defined as a sequence of obs ending with either the last obs for a given ID, or when the next observation has DIF>1:

``````data have;
input id dif sum;
datalines;
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	3
2	1	3
2	1	3
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2
run;
data want (drop=_:);
do count=1 by 1 until (last.id or _next_dif>1);
set have;
by id;
merge have have (firstobs=2 keep=dif rename=(dif=_next_dif));
end;
do _n_=1 to count;
set have;
output;
end;
run;
``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

Re: Generating id

Thanks.

How can I exclude counting any rows where the value of dif is either 0 or missing?

Discussion stats
• 11 replies
• 883 views
• 0 likes
• 4 in conversation