I have a dataset which has a number of variables including NAME.
I am trying to delete duplicate observations of name where one observation would be 'John Smith' and another observation would be 'smith john' .. they are clearly the same person and i want to delete the duplicate entry .. what would be the most efficient way to do it ?
considering also that the duplicate names could occur anywhere within the dataset.
Ex:
John Smith
Cal Harper
freddy Holt
smith john
frank waters
harper Cal
@jfaruqui Ok Lets go linear
data have;
input name $50.;
cards;
John Smith
Cal Harper
freddy Holt
smith john
frank waters
harper Cal
;
run;
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
n=compress(upcase(name));
do _n_=1 to length(n);
t(_n_)=char(n,_n_);
end;
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w n) nodupkey;
by w;
run;
I responded the same question in your other thread
data have;
input name $50.;
cards;
John Smith
Cal Harper
freddy Holt
smith john
frank waters
harper Cal
;
run;
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
call pokelong(compress(upcase(name)),addrlong(t(1)),50);
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w) nodupkey;
by w;
run;
I don't know how to merge the threads although I could request @Reeza / @Kurt_Bremser to help merge the duplicate threads
Going forward, Kindly edit in the same thread you started plz
Hang on, if you are new/relatively new to SAS let alone APP, I beg your pardon, ignore the use of APP data management functions.
Ok, just try the 32 bit version-->
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
call poke(compress(upcase(name)),addr(t(1)),50);
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w) nodupkey;
by w;
run;
Test this and see if this works
@jfaruqui Ok Lets go linear
data have;
input name $50.;
cards;
John Smith
Cal Harper
freddy Holt
smith john
frank waters
harper Cal
;
run;
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
n=compress(upcase(name));
do _n_=1 to length(n);
t(_n_)=char(n,_n_);
end;
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w n) nodupkey;
by w;
run;
I have a dataset which has a number of variables including NAME.
I am trying to delete duplicate observations of name where one observation would be 'John Smith' and another observation would be 'smith john' .. they are clearly the same person and i want to delete the duplicate entry .. what would be the most efficient way to do it ?
data have;
length name $50;
name='smith john';
output;
name='John smith';
output;
name='Mcdonald John';
output;
name='John Mcdonald';
output;
run;
data t;
set have;
array t(50) $1 ;
call pokelong(compress(upcase(name)),addrlong(t(1)),50);
call sortc(of t(*));
w=cats(of t(*));
drop t:;
run;
proc sort data=t out=want(drop=w) nodupkey;
by w;
run;
With temporary array,
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
call pokelong(compress(upcase(name)),addrlong(t(1)),50);
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w) nodupkey;
by w;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.