I have a SAS dataset with columns ID(number), Name(Character) and other columns.
ID Name State
1 John CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John VA
If any of the name in the name column repeats or same name occurs more than once,then it should concatenate with the ID in the name column as below in the new dataset.
ID Name State
1 John1 CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John5 VA
What might be the best way to do this?
Thanks
Like this?
data REF;
input ID Name $ State $;
cards;
1 John CA
2 Tim MD
3 Amy FL
4 Seth NM
5 John VA
6 Amy FL
run;
data HAVE;
input ID Name $ ;
cards;
2 Tim
4 Seth
5 John
6 Amy
run;
proc sql;
create table WANT as
select have.ID
,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME
from HAVE
left join
(select NAME from REF group by NAME having count(NAME) > 1) dup
on have.NAME=dup.NAME ;
quit;
ID NAME
6 Amy (6)
5 John (5)
4 Seth
2 Tim
Like this?
data HAVE;
input ID Name $ State $;
cards;
1 John CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John VA
run;
proc sql;
create table DUPLICATES as
select NAME
from HAVE
group by NAME
having count(NAME) > 1;
quit;
data WANT;
set HAVE;
if _N_=1 then do;
dcl hash DUPLICATES(dataset:'DUPLICATES');
DUPLICATES.definekey('NAME');
DUPLICATES.definedone();
end;
if DUPLICATES.check()=0 then NAME=catt(NAME,ID);
run;
ID Name State
1 John1 CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John5 VA
[Edit: no need for 2 steps, I started with the assumption that you wanted
NAME=catt(NAME,_N_);
I realised it was
NAME=catt(NAME,ID);
but did not change the code.
So use one of the other single-steps solutions proposed.]
Thank you. This is really helpful. But my requirement is changed a little bit. A Teradata table has the following which can be used as a reference.
ID Name State
1 John CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John VA
6 Amy NM
So, if my dataset has any of the names which have occurred more than once in the above reference table. Then it should concatenate. It may not be repeating in the dataset.
ID Name
5 John
4 Seth
3 Tim
6 Amy
The output should be
ID Name
5 John (5)
4 Seth
3 Tim
6 Amy (6)
Thanks in Advance.
Given your Have data and your desired result this appears to be mainly a question of "How to select the row with the same name and the highest ID?". Is that correct??
If so then why do you want to concatenate the ID to the name as well? Isn't that information you've got already in the ID column?
I also assume that you would want code which executes in-database? Right?
Like this?
data REF;
input ID Name $ State $;
cards;
1 John CA
2 Tim MD
3 Amy FL
4 Seth NM
5 John VA
6 Amy FL
run;
data HAVE;
input ID Name $ ;
cards;
2 Tim
4 Seth
5 John
6 Amy
run;
proc sql;
create table WANT as
select have.ID
,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME
from HAVE
left join
(select NAME from REF group by NAME having count(NAME) > 1) dup
on have.NAME=dup.NAME ;
quit;
ID NAME
6 Amy (6)
5 John (5)
4 Seth
2 Tim
If you want the code on the reference table to execute in database,
create an intermediate table for
select NAME from REF group by NAME having count(NAME) > 1
I doubt SAS will pass it as is, but then maybe it will.
Hi ChrisNZ,
This works great but the 'have' dataset has about 50 other variables which maynot be convenient to include all in the select clause. Is there any way so I don't have to include all variables in the select clause?
Thanks.
> Is there any way so I don't have to include all variables in the select clause?
You are really hard to follow. The 50 variables are not included. Only 2 variables are.
Unless you mean you want to *include* but not to explicitly *list*?
Like this?
create table WANT(drop=NAME rename=(NAME2=NAME)) as
select have.*
,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME2
Please try to be precise in your questions and explanations. It will help you greatly.
Throwing a few words together as they come is seldom right the first time. Treat these pages as a school essay, not a shopping list.
That was what I was trying to say. Thanks for your suggestion.
data HAVE;
input ID Name $ State $;
cards;
1 John CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John VA
run;
data want;
if _N_=1 then do;
if 0 then set have;
declare hash h(dataset:'have',multidata:'y');
h.definekey('NAME');
h.definedone();
end;
set have;
if h.check()=0 then do;
count=1;
h.has_next(result: r);
if r ne 0 then count+1;
if count>1 then NAME=catt(NAME,ID);
end;
drop count r;
run;
No need for a 'count' variable in Hash setting:
data want_h;
if _n_=1 then do;
declare hash h(dataset:'have(keep=name)', multidata:'y');
h.definekey('name');
h.definedata(all:'y');
h.definedone();
end;
set have;
h.find();
h.has_next(result:_n_);
if _n_ ne 0 then name=cats(name,id);
run;
@Haikuo Right, Thank you and a neat catch. Can't believe that didn't cross my mind. I always lack attention to details. 😞
Thank you. This is really helpful. But my requirement is changed a little bit. A Teradata table has the following which can be used as a reference.
ID Name State
1 John CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John VA
6 Amy NM
So, if my dataset has any of the names which have occurred more than once in the above reference table. Then it should concatenate. It may not be repeating in the dataset.
ID Name
5 John
4 Seth
3 Tim
6 Amy
The output should be
ID Name
5 John (5)
4 Seth
3 Tim
6 Amy (6)
Thanks in Advance.
data have;
input ID Name$ State$;
cards;
1 John CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John VA
;
proc sort data=have;
by name;
run;
data want;
do until(last.name);
set have;
by name;
retain cnt;
if first.name then cnt=1;
else cnt+1;
end;
do until(last.name);
set have;
by name;
if cnt>1 then name=cats(name,put(id,best.));
output;
end;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.