BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Pyrite | Level 9

I have a data set with IDs who have the following variables: 

-a numeric variable we'll call "event" indicating event number (i.e. 1st event, 2nd event, etc.).

-a numeric variable we'll call "type" indicating event type (i.e. 1=fast, 2=slow, etc.). 

-a numeric variable we'll call "cause" indicating cause of the event (i.e. 1=person 2=vehicle, etc.)

 

I am trying to code a variable that indicates the number of occurrences for that id with the same type and cause. So, the first appearance of that combination would be 1, the second 2, etc. I'll call this variable "same." My dataset is sorted by ID and date so "same" should reflect the first occurence of that type/cause combo, then the second and so on. 

 

Sample code below showing what I have and what I want: 

data have;
input id event type cause;
datalines;
1 1 1 1
1 2 1 2
2 1 2 2
3 1 1 2
3 2 2 1
3 3 2 1
3 4 1 2
3 5 1 2
3 6 1 1
4 1 1 1
4 2 1 1
;
RUN;

data want;
input id event type cause same;
datalines;
1 1 1 1 1
1 2 1 2 1
2 1 2 2 1
3 1 1 2 1
3 2 2 1 1
3 3 2 1 2
3 4 1 2 2
3 5 1 2 3
3 6 1 1 1
4 1 1 1 1
4 2 1 1 2
;
RUN;

Does this make sense? 

 

As always, thanks to everyone for always helping out!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Where is the Date variable? You say that your data is sorted by Id and Date but I don't see any date variable.

 

For your example this works:

proc sort data=have;
   by id type cause;
run;

data want;
   set have;
   retain same;
   by id type cause;
   if first.cause then same=1;
   else same+1;
run;

proc sort data=want;
  by id event type cause;
run;

Note that I sort the data in a different order because you want to count by "first occurence of that type/cause combo, then the second and so on. "

 

IF there had been a DATE variable then to enforce processing in order I would use:

proc sort data=have;
   by id type cause date;
run;

and change the final sort order if needed. But the grouping by Type and Cause within the Id would be processed the same.

 

Variables that appear on a BY statement in SAS will have automatic associated temporary (meaning not written to the data set) variables indicating First or Last of a by group. Those are referenced as First.Variable or Last.Variable and have numeric values of 1 (true) or 0 (false). So can be used to do thing like resetting counters.

RETAIN identifies variables whose values are to be kept across the data step boundary, such as your counter.

One of the keys to using these is which order is important. The Event order is not important for the count so sort and use a different BY variable list.

Then resort when done if the 'event' order is important.

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

Where is the Date variable? You say that your data is sorted by Id and Date but I don't see any date variable.

 

For your example this works:

proc sort data=have;
   by id type cause;
run;

data want;
   set have;
   retain same;
   by id type cause;
   if first.cause then same=1;
   else same+1;
run;

proc sort data=want;
  by id event type cause;
run;

Note that I sort the data in a different order because you want to count by "first occurence of that type/cause combo, then the second and so on. "

 

IF there had been a DATE variable then to enforce processing in order I would use:

proc sort data=have;
   by id type cause date;
run;

and change the final sort order if needed. But the grouping by Type and Cause within the Id would be processed the same.

 

Variables that appear on a BY statement in SAS will have automatic associated temporary (meaning not written to the data set) variables indicating First or Last of a by group. Those are referenced as First.Variable or Last.Variable and have numeric values of 1 (true) or 0 (false). So can be used to do thing like resetting counters.

RETAIN identifies variables whose values are to be kept across the data step boundary, such as your counter.

One of the keys to using these is which order is important. The Event order is not important for the count so sort and use a different BY variable list.

Then resort when done if the 'event' order is important.

 

 

sasgorilla
Pyrite | Level 9

You are right, I neglected to include dates. Sorry! I assume you would just add date as a variable to sort by so I will try that.

As for the first.variable, can you explain how first.cause accounts for both type and cause being the same? Is it because first.cause indicates it is the first time this value appears given all preceding variables being a certain value? 

Thank you.

ballardw
Super User

 

Suggestion: To see how First and Last variables work assign them to variables that are not temporary so you can see how they change.

data examine;
   set have;
   retain same;
   by id type cause;
   F_id = first.id;
   L_id = last.id;
   F_type=first.type;
   L_type = last.type;
   F_cause=first.first;
   L_cause=last.cause;

run;

When a BY variable before (to the left of) another changes the groups reset.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1327 views
  • 2 likes
  • 2 in conversation