DATA Step, Macro, Functions and more

Concatenate if duplicate

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Concatenate if duplicate

I have a SAS dataset with columns ID(number), Name(Character) and other columns.

 

ID      Name         State

1        John            CA

2        Amy             FL

3        Tim              MD

4        Seth             NM

5        John             VA

 

If any of the name in the name column repeats or same name occurs more than once,then it should concatenate with the ID in the name column as below in the new dataset.

 

 

ID      Name         State

1        John1          CA

2        Amy             FL

3        Tim              MD

4        Seth             NM

5        John5           VA

 

 

What might be the best way to do this?

 

Thanks

 

 


Accepted Solutions
Solution
2 weeks ago
PROC Star
Posts: 2,318

Re: Concatenate if duplicate

Posted in reply to nickspencer

 

Like this?


data REF;
input ID      Name $        State $;
cards;
1        John            CA
2        Tim             MD
3        Amy             FL
4        Seth            NM
5        John            VA
6        Amy             FL
run; 
 
data HAVE;
input ID      Name $ ;
cards;
2        Tim         
4        Seth        
5        John        
6        Amy         
run; 
 
proc sql; 
  create table WANT as 
  select have.ID
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME
  from HAVE 
         left join 
      (select NAME from REF group by NAME having count(NAME) > 1) dup
        on have.NAME=dup.NAME ;
quit;

ID NAME
6  Amy (6)
5  John (5)
4  Seth
2  Tim

 

View solution in original post


All Replies
PROC Star
Posts: 2,318

Re: Concatenate if duplicate

[ Edited ]
Posted in reply to nickspencer

Like this?

data HAVE;
input ID      Name $        State $;
cards;
1        John            CA
2        Amy             FL
3        Tim             MD
4        Seth            NM
5        John            VA
run; 
 
proc sql; 
  create table DUPLICATES as 
  select NAME 
  from HAVE 
  group by NAME
  having count(NAME) > 1;
quit;

data WANT;
  set HAVE;
  if _N_=1 then do;
    dcl hash DUPLICATES(dataset:'DUPLICATES');
    DUPLICATES.definekey('NAME');
    DUPLICATES.definedone();
  end;
  if DUPLICATES.check()=0 then NAME=catt(NAME,ID);
run;

ID Name  State
1  John1 CA
2  Amy   FL
3  Tim   MD
4  Seth  NM
5  John5 VA

 

[Edit: no need for 2 steps, I started with the assumption that you wanted

 NAME=catt(NAME,_N_);

I realised it was 

 NAME=catt(NAME,ID); 

but did not change the code. 

So use one of the other single-steps solutions proposed.]

Occasional Contributor
Posts: 16

Re: Concatenate if duplicate

Thank you. This is really helpful. But my requirement is changed a little bit. A Teradata table has the following which can be used as a reference.

 

ID      Name         State

1        John            CA

2        Amy             FL

3        Tim              MD

4        Seth             NM

5        John             VA

6         Amy             NM

 

So, if my dataset has any of the names which have occurred more than once in the above reference table. Then it should concatenate. It may not be repeating in the dataset.

 

ID Name

5   John

4   Seth

3   Tim

6   Amy

 

The output should be

 

ID Name

5   John (5)

4   Seth

3   Tim

6    Amy (6)

 

 

Thanks in Advance.

 

Respected Advisor
Posts: 4,679

Re: Concatenate if duplicate

Posted in reply to nickspencer

@nickspencer

Given your Have data and your desired result this appears to be mainly a question of "How to select the row with the same name and the highest ID?". Is that correct??

If so then why do you want to concatenate the ID to the name as well? Isn't that information you've got already in the ID column?

I also assume that you would want code which executes in-database? Right?

Solution
2 weeks ago
PROC Star
Posts: 2,318

Re: Concatenate if duplicate

Posted in reply to nickspencer

 

Like this?


data REF;
input ID      Name $        State $;
cards;
1        John            CA
2        Tim             MD
3        Amy             FL
4        Seth            NM
5        John            VA
6        Amy             FL
run; 
 
data HAVE;
input ID      Name $ ;
cards;
2        Tim         
4        Seth        
5        John        
6        Amy         
run; 
 
proc sql; 
  create table WANT as 
  select have.ID
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME
  from HAVE 
         left join 
      (select NAME from REF group by NAME having count(NAME) > 1) dup
        on have.NAME=dup.NAME ;
quit;

ID NAME
6  Amy (6)
5  John (5)
4  Seth
2  Tim

 

PROC Star
Posts: 2,318

Re: Concatenate if duplicate

If you want the code on the reference table to execute in database,

create an intermediate table for 

select NAME from REF group by NAME having count(NAME) > 1

I doubt SAS will pass it as is, but then maybe it will.

Occasional Contributor
Posts: 16

Re: Concatenate if duplicate

Hi ChrisNZ,

 

This works great but the 'have' dataset has about 50 other variables which maynot be convenient to include all in the select clause. Is there any way so I don't have to include all variables in the select clause?

 

Thanks.

PROC Star
Posts: 2,318

Re: Concatenate if duplicate

Posted in reply to nickspencer

Is there any way so I don't have to include all variables in the select clause?

 

You are really hard to follow. The 50 variables are not included. Only 2 variables are.

PROC Star
Posts: 2,318

Re: Concatenate if duplicate

Posted in reply to nickspencer

Unless you mean you want to *include* but not to explicitly *list*?

Like this?

 create table WANT(drop=NAME rename=(NAME2=NAME)) as 
  select have.*
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME2

Please try to be precise in your questions and explanations. It will help you greatly.

Throwing a few words together as they come is seldom right the first time. Treat these pages as a school essay, not a shopping list. 

 

 

Occasional Contributor
Posts: 16

Re: Concatenate if duplicate

That was what I was trying to say. Thanks for your suggestion. Smiley Happy

PROC Star
Posts: 1,584

Re: Concatenate if duplicate

Posted in reply to nickspencer
data HAVE;
input ID      Name $        State $;
cards;
1        John            CA
2        Amy             FL
3        Tim             MD
4        Seth            NM
5        John            VA
run; 

data want;
  if _N_=1 then do;
  if 0 then set have;
    declare hash h(dataset:'have',multidata:'y');
    h.definekey('NAME');
    h.definedone();
  end;
set have;
if h.check()=0 then do;
count=1;
 h.has_next(result: r);
 if r ne 0 then count+1;
 if count>1 then NAME=catt(NAME,ID);
 end;
 drop count r;
 run;

Respected Advisor
Posts: 3,167

Re: Concatenate if duplicate

Posted in reply to novinosrin

No need for a 'count' variable in Hash setting:

data want_h;
 if _n_=1 then do;
   declare hash h(dataset:'have(keep=name)', multidata:'y');
   h.definekey('name');
   h.definedata(all:'y');
   h.definedone();
   end;

   set have;
   h.find(); 
   h.has_next(result:_n_); 
   if _n_ ne 0 then name=cats(name,id);

run; 
PROC Star
Posts: 1,584

Re: Concatenate if duplicate

@Haikuo Right, Thank you and a neat catch. Can't believe that didn't cross my mind. I always lack attention to details.  Smiley Sad

Occasional Contributor
Posts: 16

Re: Concatenate if duplicate

Thank you. This is really helpful. But my requirement is changed a little bit. A Teradata table has the following which can be used as a reference.

 

ID      Name         State

1        John            CA

2        Amy             FL

3        Tim              MD

4        Seth             NM

5        John             VA

6         Amy             NM

 

So, if my dataset has any of the names which have occurred more than once in the above reference table. Then it should concatenate. It may not be repeating in the dataset.

 

ID Name

5   John

4   Seth

3   Tim

6   Amy

 

The output should be

 

ID Name

5   John (5)

4   Seth

3   Tim

6    Amy (6)

 

 

Thanks in Advance.

 

 

 

 

 

 

Trusted Advisor
Posts: 1,147

Re: Concatenate if duplicate

Posted in reply to nickspencer
data have;
input ID      Name$         State$;
cards;
1        John            CA
2        Amy             FL
3        Tim              MD
4        Seth             NM
5        John             VA
;

proc sort data=have;
by name;
run;

data want;
do until(last.name);
set have;
by name;
retain cnt;
if first.name then cnt=1;
else cnt+1;
end;
do until(last.name);
set have;
by name;
if cnt>1 then name=cats(name,put(id,best.));
output;
end;
run; 
Thanks,
Jag
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 255 views
  • 5 likes
  • 7 in conversation