Solved: Need to count number of instances per id

erin3 · Posted 07-27-2021 08:05 PM

Hi! I'm trying to create an output that counts the number of times an individual has had an injury. I am working with data similar to dataset have, but I want a fourth column starts counting fresh for each new id. How do I go about doing this? I'm on SAS Enterprise.

data have;
input id injury previous_injury;
datalines;
1	0	0
1	0	0
1	1	0
1	0	1
1	1	1
2	0	0
2	1	0
2	1	1
3	1	0
3	0	1
3	0	1
4	0	0
4	1	0
4	0	1
4	1	1
;
run;



data want;
input id injury previous_injury count_injury;
datalines;
1 0 0 0
1 0 0 0
1 1 0 1
1 0 1 1
1 1 1 2
2 0 0 0
2 1 0 0
2 1 1 1
3 1 0 1
3 0 1 1
3 0 1 1
4 0 0 0
4 1 0 1
4 1 1 2
4 1 1 3
;
run;

I've tried some sql coding and the data step using the retain statement. The closest I've gotten is a single count for an id number (ex. each row for id 4 would say 3).

mkeintz · Posted 07-28-2021 10:38 AM

@erin3 wrote:

Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable.

You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly.

Can the problem still be solved? 😕

Yes, most easily through use of a "summing statement":

data want;
  set have;
  by id;
  if first.id then injury_count=0;
  injury_count+injury;
run;

The statement:

  injury_count+injury;

is a "summing statement". Unlike the statement "injury_count=injury_count+injury", it primarily does two things

Adds the value of injury to the rolling total of injury_count and stores the result in injury_count. It will treat a starting value of injury_count of missing (".") as a zero if the addend injury is not also missing.
Tells SAS to retain (i.e. carry over) the injury_count value from one observation to the next. This gets the rolling total.

The "set have; by id;" statements tell SAS to expect the incoming data to be sorted by ID. It generates the temporary dummies first.id and last.id, so that you can test whether the observation in hand is at the start or end of a given BY variable group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

erin3 · Posted 07-27-2021 08:09 PM

Yikes! I'm not sure why that didn't post right. Please see this code instead.

data have;
input id injury previous_injury;
datalines;
1	0	0
1	0	0
1	1	0
1	0	1
1	1	1
2	0	0
2	1	0
2	1	1
3	1	0
3	0	1
3	0	1
4	0	0
4	1	0
4	0	1
4	1	1
;
run;

data want;
input id injury previous_injury count_injury;
datalines;
1	0	0	0
1	0	0	0
1	1	0	1
1	0	1	1
1	1	1	2
2	0	0	0
2	1	0	0
2	1	1	1
3	1	0	1
3	0	1	1
3	0	1	1
4	0	0	0
4	1	0	1
4	1	1	2
4	1	1	3
;
run;

mkeintz · Posted 07-27-2021 09:37 PM

ID2 has the following sequence:

ID	Injury	Prev_Injury
2	0	0
2	1	0
2	1	1

So why is your injury_count sequence 0 (for first row), 1 (second row), 1 (3rd row), instead of 0,1,2?

What is the role, if any, of the previous_injury column?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

erin3 · Posted 07-28-2021 09:56 AM

Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable.

You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly.

Can the problem still be solved? 😕

mkeintz · Posted 07-28-2021 10:38 AM

@erin3 wrote:

Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable.

You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly.

Can the problem still be solved? 😕

Yes, most easily through use of a "summing statement":

data want;
  set have;
  by id;
  if first.id then injury_count=0;
  injury_count+injury;
run;

The statement:

  injury_count+injury;

is a "summing statement". Unlike the statement "injury_count=injury_count+injury", it primarily does two things

Adds the value of injury to the rolling total of injury_count and stores the result in injury_count. It will treat a starting value of injury_count of missing (".") as a zero if the addend injury is not also missing.
Tells SAS to retain (i.e. carry over) the injury_count value from one observation to the next. This gets the rolling total.

The "set have; by id;" statements tell SAS to expect the incoming data to be sorted by ID. It generates the temporary dummies first.id and last.id, so that you can test whether the observation in hand is at the start or end of a given BY variable group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Need to count number of instances per id

Re: Need to count number of instances per id

Re: Need to count number of instances per id

Re: Need to count number of instances per id

Re: Need to count number of instances per id

Re: Need to count number of instances per id

Need to count number of instances per id

Re: Need to count number of instances per id

Re: Need to count number of instances per id

Re: Need to count number of instances per id

Re: Need to count number of instances per id

Re: Need to count number of instances per id

Ready to join fellow brilliant minds for the SAS Hackathon?