I have a dataset which has multiple duplicate entries, say for variable "abc". Now, in the dataset, some values for this variable will have 3/4/6/7 duplicate entries.
What I want to do is only pick up the first 2 duplicate entires, so in the new dataset every value for variable "abc" has 2 duplicate entries.
What you have to do is : 1) sort your dataset 2) read the sorted data with a Data step and create a new variable 3) this variable will be zeroed for each new value of ABC 4) this variable will be added 1 on every observation 5) keep the observation if the new variable is less or equal to 2.
[pre]
PROC SORT DATA = myData OUT = sortedData ;
BY abc whatEver ;
RUN ;
DATA first2dup (WHERE=(countObs LE 2)) ;
SET sortedData ;
BY abc ;
IF FIRST.abc THEN countObs = 0 ;
countObs + 1 ;
RUN ;
[/pre]
Regards.
Olivier