BookmarkSubscribeRSS Feed
JVargas
Fluorite | Level 6

How to take out duplicated observations in a data step?

5 REPLIES 5
CodingDiSASter
Fluorite | Level 6

You can use a proc sort function to remove duplicates by a given variable. In a given dataset 'have' we can remove duplicate names by:

 

Proc sort data=have nodupkey;

by name;

run;

 

The nodupkey lets you name a variable to remove duplicates named in the by statement (in this case 'name'). This code will not change the original dataset, only the output. If you need a new data set (example: a temporary set called 'want') with the duplicates removed, you can add an out statement:

 

Proc sort data=have nodupkey out=want;

by name;

run;

andreas_lds
Jade | Level 19

See Maxim 7 and 14 in Maxims of Maximally Efficient SAS Programmers 

If you have to use a data-step due to hardly comprehensible reasons, you either need to sort the data before processing it, if it is not at least grouped by the variable identifying a duplicate, or you could use a hash-object, if the dataset is not to large - it has to fit into memory available to your sas-session.

proc sort data= have out= sorted;
  by by_variables;
run;

data want;
  set sorted;
  by by_variables;
  if first.last_by_variable;
run;

If the data you have is grouped by the by-variables, then you can skip sorting and add "notsorted" to the by-statement in the data step.

Ksharp
Super User
proc sort data= have out= sorted;
by by_variables;
run;

data want_duplicated ;
set sorted;
by by_variables;
if not (first.last_by_variable and last.last_by_variable ) ;
run;
hswdl01
Fluorite | Level 6

You could do this in a PROC SORT step using nodupkey, but if you want to specifically do it in a data step you could run code similar to:

DATA WORK.data;
	BY	var;
		IF	FIRST.last_var;
	RUN;

that should get rid of the duplicates.

morganmetzger
Calcite | Level 5

You can use the "nodupkey" function.

Ex:

PROC SORT

DATA = WORK.libname NODUPKEY;

BY variable;

RUN;

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 763 views
  • 2 likes
  • 6 in conversation