DATA Step, Macro, Functions and more

Playing with duplicate entries

Reply
Contributor
Posts: 59

Playing with duplicate entries

I have a dataset which has multiple duplicate entries, say for variable "abc". Now, in the dataset, some values for this variable will have 3/4/6/7 duplicate entries.

What I want to do is only pick up the first 2 duplicate entires, so in the new dataset every value for variable "abc" has 2 duplicate entries.

Can someone help?

Thanks
Super Contributor
Posts: 260

Re: Playing with duplicate entries

What you have to do is : 1) sort your dataset 2) read the sorted data with a Data step and create a new variable 3) this variable will be zeroed for each new value of ABC 4) this variable will be added 1 on every observation 5) keep the observation if the new variable is less or equal to 2.
[pre]
PROC SORT DATA = myData OUT = sortedData ;
BY abc whatEver ;
RUN ;
DATA first2dup (WHERE=(countObs LE 2)) ;
SET sortedData ;
BY abc ;
IF FIRST.abc THEN countObs = 0 ;
countObs + 1 ;
RUN ;
[/pre]
Regards.
Olivier
Contributor
Posts: 59

Re: Playing with duplicate entries

thanks Oliver, this works great!
Ask a Question
Discussion stats
  • 2 replies
  • 125 views
  • 0 likes
  • 2 in conversation