Solved: Re: Concatenate if duplicate

nickspencer · Posted 06-06-2018 10:07 PM

I have a SAS dataset with columns ID(number), Name(Character) and other columns.

ID Name State

1 John CA

2 Amy FL

3 Tim MD

4 Seth NM

5 John VA

If any of the name in the name column repeats or same name occurs more than once,then it should concatenate with the ID in the name column as below in the new dataset.

ID Name State

1 John1 CA

2 Amy FL

3 Tim MD

4 Seth NM

5 John5 VA

What might be the best way to do this?

Thanks

ChrisNZ · Posted 06-10-2018 10:21 PM

Like this?


data REF;
input ID      Name $        State $;
cards;
1        John            CA
2        Tim             MD
3        Amy             FL
4        Seth            NM
5        John            VA
6        Amy             FL
run; 
 
data HAVE;
input ID      Name $ ;
cards;
2        Tim         
4        Seth        
5        John        
6        Amy         
run; 
 
proc sql; 
  create table WANT as 
  select have.ID
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME
  from HAVE 
         left join 
      (select NAME from REF group by NAME having count(NAME) > 1) dup
        on have.NAME=dup.NAME ;
quit;

ID NAME
6 Amy (6)
5 John (5)
4 Seth
2 Tim

High-Performance SAS Coding - Third Edition

View solution in original post

ChrisNZ · Posted 06-06-2018 11:06 PM

Like this?

data HAVE;
input ID      Name $        State $;
cards;
1        John            CA
2        Amy             FL
3        Tim             MD
4        Seth            NM
5        John            VA
run; 
 
proc sql; 
  create table DUPLICATES as 
  select NAME 
  from HAVE 
  group by NAME
  having count(NAME) > 1;
quit;

data WANT;
  set HAVE;
  if _N_=1 then do;
    dcl hash DUPLICATES(dataset:'DUPLICATES');
    DUPLICATES.definekey('NAME');
    DUPLICATES.definedone();
  end;
  if DUPLICATES.check()=0 then NAME=catt(NAME,ID);
run;

ID Name State
1 John1 CA
2 Amy FL
3 Tim MD
4 Seth NM
5 John5 VA

[Edit: no need for 2 steps, I started with the assumption that you wanted

NAME=catt(NAME,_N_);

I realised it was

NAME=catt(NAME,ID);

but did not change the code.

So use one of the other single-steps solutions proposed.]

High-Performance SAS Coding - Third Edition

nickspencer · Posted 06-10-2018 04:28 PM

Thank you. This is really helpful. But my requirement is changed a little bit. A Teradata table has the following which can be used as a reference.

ID Name State

1 John CA

2 Amy FL

3 Tim MD

4 Seth NM

5 John VA

6 Amy NM

So, if my dataset has any of the names which have occurred more than once in the above reference table. Then it should concatenate. It may not be repeating in the dataset.

ID Name

5 John

4 Seth

3 Tim

6 Amy

The output should be

ID Name

5 John (5)

4 Seth

3 Tim

6 Amy (6)

Thanks in Advance.

Patrick · Posted 06-10-2018 09:56 PM

@nickspencer

Given your Have data and your desired result this appears to be mainly a question of "How to select the row with the same name and the highest ID?". Is that correct??

If so then why do you want to concatenate the ID to the name as well? Isn't that information you've got already in the ID column?

I also assume that you would want code which executes in-database? Right?

ChrisNZ · Posted 06-10-2018 10:21 PM

Like this?


data REF;
input ID      Name $        State $;
cards;
1        John            CA
2        Tim             MD
3        Amy             FL
4        Seth            NM
5        John            VA
6        Amy             FL
run; 
 
data HAVE;
input ID      Name $ ;
cards;
2        Tim         
4        Seth        
5        John        
6        Amy         
run; 
 
proc sql; 
  create table WANT as 
  select have.ID
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME
  from HAVE 
         left join 
      (select NAME from REF group by NAME having count(NAME) > 1) dup
        on have.NAME=dup.NAME ;
quit;

ID NAME
6 Amy (6)
5 John (5)
4 Seth
2 Tim

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 06-10-2018 10:24 PM

If you want the code on the reference table to execute in database,

create an intermediate table for

select NAME from REF group by NAME having count(NAME) > 1

I doubt SAS will pass it as is, but then maybe it will.

High-Performance SAS Coding - Third Edition

nickspencer · Posted 06-10-2018 11:06 PM

Hi ChrisNZ,

This works great but the 'have' dataset has about 50 other variables which maynot be convenient to include all in the select clause. Is there any way so I don't have to include all variables in the select clause?

Thanks.

ChrisNZ · Posted 06-10-2018 11:16 PM

> Is there any way so I don't have to include all variables in the select clause?

You are really hard to follow. The 50 variables are not included. Only 2 variables are.

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 06-10-2018 11:21 PM

Unless you mean you want to *include* but not to explicitly *list*?

Like this?

 create table WANT(drop=NAME rename=(NAME2=NAME)) as 
  select have.*
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME2

Please try to be precise in your questions and explanations. It will help you greatly.

Throwing a few words together as they come is seldom right the first time. Treat these pages as a school essay, not a shopping list.

High-Performance SAS Coding - Third Edition

nickspencer · Posted 06-10-2018 11:30 PM

That was what I was trying to say. Thanks for your suggestion.

novinosrin · Posted 06-06-2018 11:48 PM

data HAVE;
input ID      Name $        State $;
cards;
1        John            CA
2        Amy             FL
3        Tim             MD
4        Seth            NM
5        John            VA
run; 

data want;
  if _N_=1 then do;
  if 0 then set have;
    declare hash h(dataset:'have',multidata:'y');
    h.definekey('NAME');
    h.definedone();
  end;
set have;
if h.check()=0 then do;
count=1;
 h.has_next(result: r);
 if r ne 0 then count+1;
 if count>1 then NAME=catt(NAME,ID);
 end;
 drop count r;
 run;

Haikuo · Posted 06-07-2018 09:21 AM

No need for a 'count' variable in Hash setting:

data want_h;
 if _n_=1 then do;
   declare hash h(dataset:'have(keep=name)', multidata:'y');
   h.definekey('name');
   h.definedata(all:'y');
   h.definedone();
   end;

   set have;
   h.find(); 
   h.has_next(result:_n_); 
   if _n_ ne 0 then name=cats(name,id);

run;

novinosrin · Posted 06-07-2018 10:01 AM

@Haikuo Right, Thank you and a neat catch. Can't believe that didn't cross my mind. I always lack attention to details. 😞

nickspencer · Posted 06-10-2018 04:23 PM

Thank you. This is really helpful. But my requirement is changed a little bit. A Teradata table has the following which can be used as a reference.

ID Name State

1 John CA

2 Amy FL

3 Tim MD

4 Seth NM

5 John VA

6 Amy NM

So, if my dataset has any of the names which have occurred more than once in the above reference table. Then it should concatenate. It may not be repeating in the dataset.

ID Name

5 John

4 Seth

3 Tim

6 Amy

The output should be

ID Name

5 John (5)

4 Seth

3 Tim

6 Amy (6)

Thanks in Advance.

Jagadishkatam · Posted 06-06-2018 11:56 PM

data have;
input ID      Name$         State$;
cards;
1        John            CA
2        Amy             FL
3        Tim              MD
4        Seth             NM
5        John             VA
;

proc sort data=have;
by name;
run;

data want;
do until(last.name);
set have;
by name;
retain cnt;
if first.name then cnt=1;
else cnt+1;
end;
do until(last.name);
set have;
by name;
if cnt>1 then name=cats(name,put(id,best.));
output;
end;
run;

Thanks,
Jag

Register Today!

SAS Training: Just a Click Away