Hi! I'm trying to create an output that counts the number of times an individual has had an injury. I am working with data similar to dataset have, but I want a fourth column starts counting fresh for each new id. How do I go about doing this? I'm on SAS Enterprise.
data have;
input id injury previous_injury;
datalines;
1 0 0
1 0 0
1 1 0
1 0 1
1 1 1
2 0 0
2 1 0
2 1 1
3 1 0
3 0 1
3 0 1
4 0 0
4 1 0
4 0 1
4 1 1
;
run;
data want;
input id injury previous_injury count_injury;
datalines;
1 0 0 0
1 0 0 0
1 1 0 1
1 0 1 1
1 1 1 2
2 0 0 0
2 1 0 0
2 1 1 1
3 1 0 1
3 0 1 1
3 0 1 1
4 0 0 0
4 1 0 1
4 1 1 2
4 1 1 3
;
run;
I've tried some sql coding and the data step using the retain statement. The closest I've gotten is a single count for an id number (ex. each row for id 4 would say 3).
@erin3 wrote:
Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable.
You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly.
Can the problem still be solved? 😕
Yes, most easily through use of a "summing statement":
data want;
set have;
by id;
if first.id then injury_count=0;
injury_count+injury;
run;
The statement:
injury_count+injury;
is a "summing statement". Unlike the statement "injury_count=injury_count+injury", it primarily does two things
The "set have; by id;" statements tell SAS to expect the incoming data to be sorted by ID. It generates the temporary dummies first.id and last.id, so that you can test whether the observation in hand is at the start or end of a given BY variable group.
Yikes! I'm not sure why that didn't post right. Please see this code instead.
data have; input id injury previous_injury; datalines; 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 2 0 0 2 1 0 2 1 1 3 1 0 3 0 1 3 0 1 4 0 0 4 1 0 4 0 1 4 1 1 ; run; data want; input id injury previous_injury count_injury; datalines; 1 0 0 0 1 0 0 0 1 1 0 1 1 0 1 1 1 1 1 2 2 0 0 0 2 1 0 0 2 1 1 1 3 1 0 1 3 0 1 1 3 0 1 1 4 0 0 0 4 1 0 1 4 1 1 2 4 1 1 3 ; run;
ID2 has the following sequence:
ID | Injury | Prev_Injury |
2 | 0 | 0 |
2 | 1 | 0 |
2 | 1 | 1 |
So why is your injury_count sequence 0 (for first row), 1 (second row), 1 (3rd row), instead of 0,1,2?
What is the role, if any, of the previous_injury column?
Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable.
You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly.
Can the problem still be solved? 😕
@erin3 wrote:
Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable.
You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly.
Can the problem still be solved? 😕
Yes, most easily through use of a "summing statement":
data want;
set have;
by id;
if first.id then injury_count=0;
injury_count+injury;
run;
The statement:
injury_count+injury;
is a "summing statement". Unlike the statement "injury_count=injury_count+injury", it primarily does two things
The "set have; by id;" statements tell SAS to expect the incoming data to be sorted by ID. It generates the temporary dummies first.id and last.id, so that you can test whether the observation in hand is at the start or end of a given BY variable group.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.