Hello Ballardw, thank you so much for your response! Apparently I did not think deep enough, I apologize. After confirming with my teammate, the dataset is in the format that when there are multiple rows for the same subject, at least the city would be different. The state can repeat or change back and forth, but city is ALWAYS unique (So for the same id, there will not be rows with both same state and city). So if there are 4 rows for a subject, the 4 rows would have 4 different cities. Using the example in my question and consider "state" as the variable of interest, I am hoping in general the output would be something like: id: 1 unique state: A, B; changed 1 time unique cities: A, B, C, D; changed 3 times unique salary: 100, 101, 102; changed 2 times ... similarly for other id's. The example is not perfect as I realized. Consider the state in your example with id=4 and 6 rows, the output would be: id:4 unique state: C,D; changed 5 times. salary: 120, 110, 99, 100, 88, 90; changed 5 times. (hopefully the difference for each change can also be computed). There is also a date variable which is how the data is sorted by originally, we will stick with this order. I am really stuck on how to generalize the logic since the dataset contains about 200 subjects. Any advice is welcomed, thank you in advance!
... View more