Re: Playing with duplicate entries

tejeshwar · Posted 07-17-2008 02:07 AM

I have a dataset which has multiple duplicate entries, say for variable "abc". Now, in the dataset, some values for this variable will have 3/4/6/7 duplicate entries.

What I want to do is only pick up the first 2 duplicate entires, so in the new dataset every value for variable "abc" has 2 duplicate entries.

Can someone help?

Thanks

Olivier · Posted 07-17-2008 03:04 AM

What you have to do is : 1) sort your dataset 2) read the sorted data with a Data step and create a new variable 3) this variable will be zeroed for each new value of ABC 4) this variable will be added 1 on every observation 5) keep the observation if the new variable is less or equal to 2.
[pre]
PROC SORT DATA = myData OUT = sortedData ;
BY abc whatEver ;
RUN ;
DATA first2dup (WHERE=(countObs LE 2)) ;
SET sortedData ;
BY abc ;
IF FIRST.abc THEN countObs = 0 ;
countObs + 1 ;
RUN ;
[/pre]
Regards.
Olivier

tejeshwar · Posted 07-17-2008 06:46 AM

thanks Oliver, this works great!

Playing with duplicate entries

Re: Playing with duplicate entries

Re: Playing with duplicate entries

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away