BookmarkSubscribeRSS Feed
BayzidurRahman
Obsidian | Level 7

 I have two variables id and dif. I need to create sum of variable dif by id until the value of dif is 1. If dif>1 then counting will restart from there and continue until dif>1 or id is different.

id

dif

Sum

1

1

7

1

1

7

1

1

7

1

1

7

1

1

7

1

1

7

1

1

7

1

2

6

1

1

6

1

1

6

1

1

6

1

1

6

1

1

6

2

1

3

2

1

3

2

1

3

2

3

3

2

1

3

2

1

3

2

2

2

2

1

2

 

11 REPLIES 11
PaigeMiller
Diamond | Level 26

 I need to create sum of variable dif by id until the value of dif is 1

 

Should that really say "... until the value of dif is not equal to 1" ??

--
Paige Miller
BayzidurRahman
Obsidian | Level 7

No, it should say dif>1 because in some situations the dif can be 0.
Secondly, the sum variable should take to sum of rows.

PaigeMiller
Diamond | Level 26

Why is the value of variable SUM of the second group equal to 6? It seems to me that the sum of variable DIF in the second group is 7.

 

2 + 1 + 1 + 1+ 1 + 1 = 7

 

Similar question applies to later groups as well

--
Paige Miller
BayzidurRahman
Obsidian | Level 7

Sorry- I clarified it in my previous reply. The sum should be total number of rows.

PaigeMiller
Diamond | Level 26

This code is untested. If you want tested code, please provide the data as working SAS data step code (examples and instructions).

 

/* UNTESTED CODE */
data intermediate;
    set have;
    by id;
    if first.id or dif>1 then group+1;
run;
proc summary nway data=intermediate;
    class group;
    var dif;
    output out=counts n=sum;
run;
data want;
    merge intermediate counts(keep=group sum);
    by id;
run;

 

This assumes the original data set is sorted by ID.

 

Suggestion: if the value you want is a count and not a sum, do not name the variable SUM

--
Paige Miller
Tom
Super User Tom
Super User

You seem to be making GROUPS, but you don't actually create a variable to indicate the group number.

data want;
  set have;
  by id;
  if first.id then group=1;
  else if dif > 1 then group+1;
run;

Once you have the group then counting how may observations are in each group is easy.

proc sql;
create table want2 as 
  select *, count(*) as count
  from want
  group by id,group
;
quit;

Result

Obs    id    dif    Sum    group    count

  1     1     1      7       1        7
  2     1     1      7       1        7
  3     1     1      7       1        7
  4     1     1      7       1        7
  5     1     1      7       1        7
  6     1     1      7       1        7
  7     1     1      7       1        7
  8     1     1      6       2        6
  9     1     1      6       2        6
 10     1     1      6       2        6
 11     1     1      6       2        6
 12     1     1      6       2        6
 13     1     2      6       2        6
 14     2     1      3       1        3
 15     2     1      3       1        3
 16     2     1      3       1        3
 17     2     1      3       2        3
 18     2     1      3       2        3
 19     2     3      3       2        3
 20     2     1      2       3        2
 21     2     2      2       3        2
BayzidurRahman
Obsidian | Level 7

Thanks Tom.

I have another condition- if dif =0 or missing then that row should not be counted in the sum.

id

dif

Sum

1

1

6

1

1

6

1

0

6

1

1

6

1

1

6

1

1

6

1

1

6

1

2

6

1

1

6

1

1

6

1

1

6

1

1

6

1

1

6

2

1

2

2

.

2

2

1

2

2

3

3

2

1

3

2

1

3

2

2

2

2

1

2

Tom
Super User Tom
Super User

Just change the counting step.  Instead of COUNT(*) you could use SUM( dif not in (0 .) )

BayzidurRahman
Obsidian | Level 7

Thanks Tom. That gave me required clue. I modified the code to ignore 0 and missing in the dif variable.

 

data want;
*retain id group dif count;
  set have (drop= sum);
  by id;
  if first.id then gr=1;
  else if (dif > 2 & dif ne .) then gr+1;
 count=1;
   if dif = . or dif = 0 then count=.;
run;

proc sql;
create table want2 as 
  select *, SUM( count ) as dur
  from want
  group by id,gr
;
quit;

But one thing I noticed that in the want2 dataset it is changing the order of observations in the dif variable. Number 3 is appearing on the last row in dif. Is there any way to maintain the order as in the want dataset?

 

 

BayzidurRahman_0-1693297367276.png

 

 

mkeintz
PROC Star

Pass through each GROUP twice, once to get the group size, and once to write output with the new COUNT variable.  GROUP is defined as a sequence of obs ending with either the last obs for a given ID, or when the next observation has DIF>1:

 

data have;
  input id dif sum;
datalines;
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	3
2	1	3
2	1	3
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2
run;
data want (drop=_:); 
  do count=1 by 1 until (last.id or _next_dif>1);
    set have;
    by id;
    merge have have (firstobs=2 keep=dif rename=(dif=_next_dif));
  end;
  do _n_=1 to count;
    set have;
    output;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
BayzidurRahman
Obsidian | Level 7

Thanks.

How can I exclude counting any rows where the value of dif is either 0 or missing?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2664 views
  • 0 likes
  • 4 in conversation