I have a dataset with 16 million records and 64 variables, 2 of which I am looking to use to subset the data. Of the two, call them x and y, x has duplicates and I am looking to choose the one record of each unique x based on which one has the highest value of y. I know I can do this with either proc sql, using a "group by" approach or to sort first and then use a set statement with first.y etc approach. My concern here is which approach is generally considered more efficient ? I used to run away from all sorts until I realized that sometimes, a Proc SQL approach could be equally time-consuming. Any insights would be greatly appreciated.