hi, as written in above post, code should remove the duplicates: Check the syntex of your code: DATA TEST; infile cards dlm='|' dsd; INPUT PRIMARY_KEY :$ ADDRESS :$20. SEX :$ AGE : ZIP : DISEASE : EVENT :$ DATE ddmmyy10.; cards; AA12345|1 SAMPLE ST,NY|M|20|00000| |x|03/01/2008 AA12345|1 SAMPLE ST,NY|M|20|00000| |x|03/01/2008 AA12345|1 SAMPLE ST,NY|M|20|00000| |x|03/01/2008 |2 SAMPLE ST,FL|F|21|12345||Y|04/01/2008 |2 SAMPLE ST,FL|F|21|12345||Y|04/01/2008 |3 SAMPLE ST,CA|F|22|22222||Y|05/01/2008 BD6789||M|19|33333||Z|06/01/2008 AA12345|1SAMPLE ST,NY|M|20|00000||x|03/01/2008 |3 SAMPLE ST,CA|F|22|22222||Y|05/01/2008 ; run; 165 PROC SORT DATA=test NODUPRECS; 166 BY _ALL_; 167 RUN; NOTE: There were 9 observations read from the data set WORK.TEST. NOTE: 4 duplicate observations were deleted. NOTE: The data set WORK.TEST has 5 observations and 8 variables.
... View more