Hi all,
It has been a while since posting here, hopefully someone can help. I have a data set with something like one million records. Each record is has something like 40 variables. The variables are account numbers and things of that nature. Each record has a unique account number. However, while there might be a unique account number, there are records that have identical meter numbers, which is a problem for me. On top of that, there are meter sequence numbers that correspond to those meter numbers that are either the same or different.
Here is an example of the data set:
account# meter # metersequence#
9999999 1234567 105
8888888 1234567 104
2222222 4444444 103
1111111 4444444 108
3333333 0123456 109
For the project I am working on I need to keep in the data set only one entry of each meter# with the HIGHEST service account sequence#. For example I would want to keep the record for meter# 1234567 containing metersequence# 105.
I've tried something similar to:
proc sort data = b;
by meter # descending metersequence#;
How do I get from there to keeping the top record of each meter#?(assuming that the code does order the records properly) For example, after the file is sorted to show each of the highest metersequence#'s with their meter number, how do I go about keeping the highest metersequence# or just the top record for each ( which should be the highest metersequence#)
Sorry for the long post. I wanted to make sure people understood what I was saying.
One more question, if add the nodupkey option to the sort, which variable does it nodup on? One? Both?