BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
erin3
Calcite | Level 5

Hi! I'm trying to create an output that counts the number of times an individual has had an injury. I am working with data similar to dataset have, but I want a fourth column starts counting fresh for each new id. How do I go about doing this? I'm on SAS Enterprise.

data have;
input id injury previous_injury;
datalines;
1	0	0
1	0	0
1	1	0
1	0	1
1	1	1
2	0	0
2	1	0
2	1	1
3	1	0
3	0	1
3	0	1
4	0	0
4	1	0
4	0	1
4	1	1
;
run;



data want;
input id injury previous_injury count_injury;
datalines;
1 0 0 0
1 0 0 0
1 1 0 1
1 0 1 1
1 1 1 2
2 0 0 0
2 1 0 0
2 1 1 1
3 1 0 1
3 0 1 1
3 0 1 1
4 0 0 0
4 1 0 1
4 1 1 2
4 1 1 3
;
run;

 I've tried some sql coding and the data step using the retain statement. The closest I've gotten is a single count for an id number (ex. each row for id 4 would say 3). 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@erin3 wrote:

Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable. 

You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly. 

Can the problem still be solved? 😕


Yes, most easily through use of a "summing statement":

 

data want;
  set have;
  by id;
  if first.id then injury_count=0;
  injury_count+injury;
run;

The statement:

  injury_count+injury;

is a "summing statement".  Unlike the statement "injury_count=injury_count+injury", it primarily does two things

  1. Adds the value of injury to the rolling total of injury_count and stores the result in injury_count.  It will treat a starting value of injury_count of missing (".") as a zero if the addend injury is not also missing.

  2. Tells SAS to retain (i.e. carry over) the injury_count value from one observation to the next.  This gets the rolling total.

The "set have; by id;" statements tell SAS to expect the incoming data to be sorted by ID.  It generates the temporary dummies first.id and last.id, so that you can test whether the observation in hand is at the start or end of a given BY variable group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
erin3
Calcite | Level 5

Yikes! I'm not sure why that didn't post right. Please see this code instead. 

data have;
input id injury previous_injury;
datalines;
1	0	0
1	0	0
1	1	0
1	0	1
1	1	1
2	0	0
2	1	0
2	1	1
3	1	0
3	0	1
3	0	1
4	0	0
4	1	0
4	0	1
4	1	1
;
run;

data want;
input id injury previous_injury count_injury;
datalines;
1	0	0	0
1	0	0	0
1	1	0	1
1	0	1	1
1	1	1	2
2	0	0	0
2	1	0	0
2	1	1	1
3	1	0	1
3	0	1	1
3	0	1	1
4	0	0	0
4	1	0	1
4	1	1	2
4	1	1	3
;
run;
mkeintz
PROC Star

ID2 has the following sequence:

 

ID Injury Prev_Injury
2 0 0
2 1 0
2 1 1

 

So why is your injury_count sequence    0 (for first row), 1 (second row), 1 (3rd row), instead of 0,1,2?

 

What is the role, if any, of the previous_injury column?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
erin3
Calcite | Level 5

Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable. 

You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly. 

Can the problem still be solved? 😕

mkeintz
PROC Star

@erin3 wrote:

Previous in jury was just another variable of interest in the work flow and I wasn't sure if it was useful in building the count variable. 

You're right though! The injury count sequence for id2 should be 0, 1, 2. My tired brain wrote the example incorrectly. 

Can the problem still be solved? 😕


Yes, most easily through use of a "summing statement":

 

data want;
  set have;
  by id;
  if first.id then injury_count=0;
  injury_count+injury;
run;

The statement:

  injury_count+injury;

is a "summing statement".  Unlike the statement "injury_count=injury_count+injury", it primarily does two things

  1. Adds the value of injury to the rolling total of injury_count and stores the result in injury_count.  It will treat a starting value of injury_count of missing (".") as a zero if the addend injury is not also missing.

  2. Tells SAS to retain (i.e. carry over) the injury_count value from one observation to the next.  This gets the rolling total.

The "set have; by id;" statements tell SAS to expect the incoming data to be sorted by ID.  It generates the temporary dummies first.id and last.id, so that you can test whether the observation in hand is at the start or end of a given BY variable group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 4315 views
  • 0 likes
  • 2 in conversation