I have a data set with IDs who have the following variables:
-a numeric variable we'll call "event" indicating event number (i.e. 1st event, 2nd event, etc.).
-a numeric variable we'll call "type" indicating event type (i.e. 1=fast, 2=slow, etc.).
-a numeric variable we'll call "cause" indicating cause of the event (i.e. 1=person 2=vehicle, etc.)
I am trying to code a variable that indicates the number of occurrences for that id with the same type and cause. So, the first appearance of that combination would be 1, the second 2, etc. I'll call this variable "same." My dataset is sorted by ID and date so "same" should reflect the first occurence of that type/cause combo, then the second and so on.
Sample code below showing what I have and what I want:
data have;
input id event type cause;
datalines;
1 1 1 1
1 2 1 2
2 1 2 2
3 1 1 2
3 2 2 1
3 3 2 1
3 4 1 2
3 5 1 2
3 6 1 1
4 1 1 1
4 2 1 1
;
RUN;
data want;
input id event type cause same;
datalines;
1 1 1 1 1
1 2 1 2 1
2 1 2 2 1
3 1 1 2 1
3 2 2 1 1
3 3 2 1 2
3 4 1 2 2
3 5 1 2 3
3 6 1 1 1
4 1 1 1 1
4 2 1 1 2
;
RUN;
Does this make sense?
As always, thanks to everyone for always helping out!
Where is the Date variable? You say that your data is sorted by Id and Date but I don't see any date variable.
For your example this works:
proc sort data=have; by id type cause; run; data want; set have; retain same; by id type cause; if first.cause then same=1; else same+1; run; proc sort data=want; by id event type cause; run;
Note that I sort the data in a different order because you want to count by "first occurence of that type/cause combo, then the second and so on. "
IF there had been a DATE variable then to enforce processing in order I would use:
proc sort data=have; by id type cause date; run;
and change the final sort order if needed. But the grouping by Type and Cause within the Id would be processed the same.
Variables that appear on a BY statement in SAS will have automatic associated temporary (meaning not written to the data set) variables indicating First or Last of a by group. Those are referenced as First.Variable or Last.Variable and have numeric values of 1 (true) or 0 (false). So can be used to do thing like resetting counters.
RETAIN identifies variables whose values are to be kept across the data step boundary, such as your counter.
One of the keys to using these is which order is important. The Event order is not important for the count so sort and use a different BY variable list.
Then resort when done if the 'event' order is important.
Where is the Date variable? You say that your data is sorted by Id and Date but I don't see any date variable.
For your example this works:
proc sort data=have; by id type cause; run; data want; set have; retain same; by id type cause; if first.cause then same=1; else same+1; run; proc sort data=want; by id event type cause; run;
Note that I sort the data in a different order because you want to count by "first occurence of that type/cause combo, then the second and so on. "
IF there had been a DATE variable then to enforce processing in order I would use:
proc sort data=have; by id type cause date; run;
and change the final sort order if needed. But the grouping by Type and Cause within the Id would be processed the same.
Variables that appear on a BY statement in SAS will have automatic associated temporary (meaning not written to the data set) variables indicating First or Last of a by group. Those are referenced as First.Variable or Last.Variable and have numeric values of 1 (true) or 0 (false). So can be used to do thing like resetting counters.
RETAIN identifies variables whose values are to be kept across the data step boundary, such as your counter.
One of the keys to using these is which order is important. The Event order is not important for the count so sort and use a different BY variable list.
Then resort when done if the 'event' order is important.
You are right, I neglected to include dates. Sorry! I assume you would just add date as a variable to sort by so I will try that.
As for the first.variable, can you explain how first.cause accounts for both type and cause being the same? Is it because first.cause indicates it is the first time this value appears given all preceding variables being a certain value?
Thank you.
Suggestion: To see how First and Last variables work assign them to variables that are not temporary so you can see how they change.
data examine; set have; retain same; by id type cause; F_id = first.id; L_id = last.id; F_type=first.type; L_type = last.type; F_cause=first.first; L_cause=last.cause; run;
When a BY variable before (to the left of) another changes the groups reset.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.