Hello, I am trying to figure out a way to determine what rows of data are duplicates of each other. It is easiest to explain my problem through an example. I have a dataset as follows: Edge To From Flow Exposure 1 1 2 10 10 2 2 1 5 10 3 3 4 5 5 4 4 3 5 5 5 5 6 10 10 6 6 5 5 5 The edge value is a unique identifier for each data row. I am trying to match the data based on the "to" and "from" column where in the case of this sample data set the matching pairs (by "edge" id) are as follows 1,2 3,4 5,6 The "to" and "from" values between edge 1 and 2 are a reciprocal of each other. I want to figure out what edges are matches for each other and also sum up the flow and exposure value for the matching edges. What I would like to do is end up with a table as follows Edge To From Flow Exposure match flow_sum exposure_sum 1 1 2 10 10 1 15 20 2 2 1 5 10 1 15 20 3 3 4 5 5 2 10 10 4 4 3 5 5 2 10 10 5 5 6 10 10 3 15 15 6 6 5 5 5 3 15 15 In this case the "match" variable is just identifying what edges are pairs. It doesn't have to be a number, it can be a variable and can start at any value. Is this possible? The data set I am working with has about 300,000 rows of data with the majority of them having pairs. Thank you for the help. Cheers, Scott
... View more