Hi guys, First of all let me clarify that I am executing my code in SAS version 8.2 Have a look at the 2 scenarios below. I am basically trying to eliminate exact duplicate records from dataset "abc", keeping/dropping certain variables and also applying a filter. I understand why the first first proc sort on data abc does not delete duplicates. It is definitely because the "by" variables used do not ensure that exact duplicate records get arranged in a sequential order and hence do not get deleted. What I do not understand is the output of the second proc sort on data "def". In both the scenarios this second proc sort is exactly the same. Why then does it not eliminate duplicates in scenario 1 whereas delete in scenario 2? I am pretty sure I am missing something really basic here Scenario 1: proc sort data = abc (keep = var1 var2 var3 var4 var5 var6 var7 var8) out = def (drop = var5) noduprec; by var1 var2 var3 var4; where upcase(compbl(var5)) = "SOME TEXT" and var6 = 9999; run; NOTE: 0 duplicate observations were deleted. NOTE: There were 43 observations read from the data set ABC. WHERE (UPCASE(COMPBL(var5))='SOME TEXT') and (var6=9999); NOTE: The data set WORK.DEF has 43 observations and 7 variables. NOTE: PROCEDURE SORT used: real time 0.08 seconds cpu time 0.08 seconds proc sort data = def out= ghi noduprec; by var1 var2 var3; run; NOTE: Input data set is already sorted; it has been copied to the output data set. NOTE: There were 43 observations read from the data set WORK.DEF NOTE: The data set WORK.GHI has 43 observations and 7 variables. NOTE: PROCEDURE SORT used: real time 0.00 seconds cpu time 0.00 seconds Scenario 2: proc sort data = abc (keep = var1 var2 var3 var4 var5 var6 var7 var8) out = def (drop = var5) ; noduprec not used here by var1 var2 var3 var4; where upcase(compbl(var5)) = "SOME TEXT" and var6 = 9999; run; NOTE: There were 43 observations read from the data set ABC. WHERE (UPCASE(COMPBL(var5))='SOME TEXT') and (var6=9999); NOTE: The data set WORK.DEF has 43 observations and 7 variables. NOTE: PROCEDURE SORT used: real time 0.08 seconds cpu time 0.08 seconds proc sort data = def out= ghi noduprec; by var1 var2 var3; run; NOTE: 5 duplicate observations were deleted. NOTE: There were 43 observations read from the data set WORK.DEF NOTE: The data set WORK.GHI has 38 observations and 7 variables. NOTE: PROCEDURE SORT used: real time 0.01 seconds cpu time 0.01 seconds
... View more