I have a dataset which has a number of variables including NAME.
I am trying to delete duplicate observations of name where one observation would be 'John Smith' and another observation would be 'smith john' .. they are clearly the same person and i want to delete the duplicate entry .. what would be the most efficient way to do it ?
considering also that the duplicate names could occur anywhere within the dataset.
Ex:
John Smith
Cal Harper
freddy Holt
smith john
frank waters
harper Cal
@jfaruqui Ok Lets go linear
data have;
input name $50.;
cards;
John Smith
Cal Harper
freddy Holt
smith john
frank waters
harper Cal
;
run;
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
n=compress(upcase(name));
do _n_=1 to length(n);
t(_n_)=char(n,_n_);
end;
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w n) nodupkey;
by w;
run;
I responded the same question in your other thread
data have;
input name $50.;
cards;
John Smith
Cal Harper
freddy Holt
smith john
frank waters
harper Cal
;
run;
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
call pokelong(compress(upcase(name)),addrlong(t(1)),50);
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w) nodupkey;
by w;
run;
I don't know how to merge the threads although I could request @Reeza / @Kurt_Bremser to help merge the duplicate threads
Going forward, Kindly edit in the same thread you started plz
Hang on, if you are new/relatively new to SAS let alone APP, I beg your pardon, ignore the use of APP data management functions.
Ok, just try the 32 bit version-->
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
call poke(compress(upcase(name)),addr(t(1)),50);
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w) nodupkey;
by w;
run;
Test this and see if this works
@jfaruqui Ok Lets go linear
data have;
input name $50.;
cards;
John Smith
Cal Harper
freddy Holt
smith john
frank waters
harper Cal
;
run;
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
n=compress(upcase(name));
do _n_=1 to length(n);
t(_n_)=char(n,_n_);
end;
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w n) nodupkey;
by w;
run;
I have a dataset which has a number of variables including NAME.
I am trying to delete duplicate observations of name where one observation would be 'John Smith' and another observation would be 'smith john' .. they are clearly the same person and i want to delete the duplicate entry .. what would be the most efficient way to do it ?
data have;
length name $50;
name='smith john';
output;
name='John smith';
output;
name='Mcdonald John';
output;
name='John Mcdonald';
output;
run;
data t;
set have;
array t(50) $1 ;
call pokelong(compress(upcase(name)),addrlong(t(1)),50);
call sortc(of t(*));
w=cats(of t(*));
drop t:;
run;
proc sort data=t out=want(drop=w) nodupkey;
by w;
run;
With temporary array,
data t;
set have;
array t(50) $1 _temporary_;
call missing(of t(*));
call pokelong(compress(upcase(name)),addrlong(t(1)),50);
call sortc(of t(*));
w=cats(of t(*));
run;
proc sort data=t out=want(drop=w) nodupkey;
by w;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.