Hi, I have a dataset in which each row is a person's score for a given event on a given date. People have 5 events in each day, as well as a total score which incorporates the 5 event scores but is not a straight sum. People can be in the dataset on multiple days. ID Date Event Score 1 10/1/2012 1 5 1 10/1/2012 2 3 1 10/1/2012 3 7 1 10/1/2012 4 2 1 10/1/2012 5 3 1 10/1/2012 tot 17 1 10/1/2012 tot 14 2 9/7/2014 1 3 2 9/7/2014 1 null 2 9/7/2014 2 7 2 9/7/2014 2 null I'm trying to clean the data but it has 3 problems: 1. One person will have multiple sets of events on one day, in which one set has values and one set is null/missing 2. One person will have multiple sets of events on one day, in which both sets have identical values 3. One person will have multiple sets of events in one day, in which both sets have different values This is my solution to 1 and 2: Proc sort data=have; By ID date event descending score; Run; Data want; Set have; By ID date event descending score; If first.event output want; Run; That should output the first of two identical rows or a non-missing row if one set of responses has values and the other set is missing. However, it does not address the third problem. I know I can't get rid of a row at random so if I could maybe create a flag or some sort of indicator that would show that the person had two distinct scores for that event, that would be helpful. Additionally, when I was going through the data, the ones that I noticed people having multiple scores on the same day for were the total scores. If event=tot were the only ones for which a person had two distinct values of score, I could just delete the event=tot rows altogether, transpose the data, then calculate the total scores myself. However, I would need to be able to verify that it was only the event=tot rows that had two distinct responses, and I'm not sure how to do that. Any help is much appreciated.
... View more