@BrahmanandaRao wrote:
data test; input Empname $ ; datalines; ram sita ram arjun ram sita ; run;
Interview asked me a question without sorting how to remove duplicates
using above dataset scenario he said donot chage order of empnames but remove duplicates only datastep method
In general a HASH (or some other method of remembering what values you have seen before) will do this.
data want;
if _n_=1 then do;
declare hash h();
h.definekey('empname');
h.definedone();
end;
set test ;
if h.find() then do;
output;
h.add();
end;
run;
But if the data is too large then HASH will not work (HASH needs to be in memory) as would any other DATA step only method. In which case sorting is probably your best method. Either directly using PROC SORT or implicitly using PROC SQL code. Just add a new variable to record the original order so it can be recreated.
data temp;
row+1;
set test;
run;
proc sql ;
create table want as
select empname
from temp
group by empname
having row=min(row)
order by row
;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.