BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nickspencer
Obsidian | Level 7

I have a SAS dataset with columns ID(number), Name(Character) and other columns.

 

ID      Name         State

1        John            CA

2        Amy             FL

3        Tim              MD

4        Seth             NM

5        John             VA

 

If any of the name in the name column repeats or same name occurs more than once,then it should concatenate with the ID in the name column as below in the new dataset.

 

 

ID      Name         State

1        John1          CA

2        Amy             FL

3        Tim              MD

4        Seth             NM

5        John5           VA

 

 

What might be the best way to do this?

 

Thanks

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

 

Like this?


data REF;
input ID      Name $        State $;
cards;
1        John            CA
2        Tim             MD
3        Amy             FL
4        Seth            NM
5        John            VA
6        Amy             FL
run; 
 
data HAVE;
input ID      Name $ ;
cards;
2        Tim         
4        Seth        
5        John        
6        Amy         
run; 
 
proc sql; 
  create table WANT as 
  select have.ID
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME
  from HAVE 
         left join 
      (select NAME from REF group by NAME having count(NAME) > 1) dup
        on have.NAME=dup.NAME ;
quit;

ID NAME
6  Amy (6)
5  John (5)
4  Seth
2  Tim

 

View solution in original post

18 REPLIES 18
ChrisNZ
Tourmaline | Level 20

Like this?

data HAVE;
input ID      Name $        State $;
cards;
1        John            CA
2        Amy             FL
3        Tim             MD
4        Seth            NM
5        John            VA
run; 
 
proc sql; 
  create table DUPLICATES as 
  select NAME 
  from HAVE 
  group by NAME
  having count(NAME) > 1;
quit;

data WANT;
  set HAVE;
  if _N_=1 then do;
    dcl hash DUPLICATES(dataset:'DUPLICATES');
    DUPLICATES.definekey('NAME');
    DUPLICATES.definedone();
  end;
  if DUPLICATES.check()=0 then NAME=catt(NAME,ID);
run;

ID Name  State
1  John1 CA
2  Amy   FL
3  Tim   MD
4  Seth  NM
5  John5 VA

 

[Edit: no need for 2 steps, I started with the assumption that you wanted

 NAME=catt(NAME,_N_);

I realised it was 

 NAME=catt(NAME,ID); 

but did not change the code. 

So use one of the other single-steps solutions proposed.]

nickspencer
Obsidian | Level 7

Thank you. This is really helpful. But my requirement is changed a little bit. A Teradata table has the following which can be used as a reference.

 

ID      Name         State

1        John            CA

2        Amy             FL

3        Tim              MD

4        Seth             NM

5        John             VA

6         Amy             NM

 

So, if my dataset has any of the names which have occurred more than once in the above reference table. Then it should concatenate. It may not be repeating in the dataset.

 

ID Name

5   John

4   Seth

3   Tim

6   Amy

 

The output should be

 

ID Name

5   John (5)

4   Seth

3   Tim

6    Amy (6)

 

 

Thanks in Advance.

 

Patrick
Opal | Level 21

@nickspencer

Given your Have data and your desired result this appears to be mainly a question of "How to select the row with the same name and the highest ID?". Is that correct??

If so then why do you want to concatenate the ID to the name as well? Isn't that information you've got already in the ID column?

I also assume that you would want code which executes in-database? Right?

ChrisNZ
Tourmaline | Level 20

 

Like this?


data REF;
input ID      Name $        State $;
cards;
1        John            CA
2        Tim             MD
3        Amy             FL
4        Seth            NM
5        John            VA
6        Amy             FL
run; 
 
data HAVE;
input ID      Name $ ;
cards;
2        Tim         
4        Seth        
5        John        
6        Amy         
run; 
 
proc sql; 
  create table WANT as 
  select have.ID
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME
  from HAVE 
         left join 
      (select NAME from REF group by NAME having count(NAME) > 1) dup
        on have.NAME=dup.NAME ;
quit;

ID NAME
6  Amy (6)
5  John (5)
4  Seth
2  Tim

 

ChrisNZ
Tourmaline | Level 20

If you want the code on the reference table to execute in database,

create an intermediate table for 

select NAME from REF group by NAME having count(NAME) > 1

I doubt SAS will pass it as is, but then maybe it will.

nickspencer
Obsidian | Level 7

Hi ChrisNZ,

 

This works great but the 'have' dataset has about 50 other variables which maynot be convenient to include all in the select clause. Is there any way so I don't have to include all variables in the select clause?

 

Thanks.

ChrisNZ
Tourmaline | Level 20

Is there any way so I don't have to include all variables in the select clause?

 

You are really hard to follow. The 50 variables are not included. Only 2 variables are.

ChrisNZ
Tourmaline | Level 20

Unless you mean you want to *include* but not to explicitly *list*?

Like this?

 create table WANT(drop=NAME rename=(NAME2=NAME)) as 
  select have.*
        ,catx(' ',have.NAME,ifc(dup.NAME ne ' ', catt('(',ID,')'),' ')) as NAME2

Please try to be precise in your questions and explanations. It will help you greatly.

Throwing a few words together as they come is seldom right the first time. Treat these pages as a school essay, not a shopping list. 

 

 

nickspencer
Obsidian | Level 7

That was what I was trying to say. Thanks for your suggestion. Smiley Happy

novinosrin
Tourmaline | Level 20
data HAVE;
input ID      Name $        State $;
cards;
1        John            CA
2        Amy             FL
3        Tim             MD
4        Seth            NM
5        John            VA
run; 

data want;
  if _N_=1 then do;
  if 0 then set have;
    declare hash h(dataset:'have',multidata:'y');
    h.definekey('NAME');
    h.definedone();
  end;
set have;
if h.check()=0 then do;
count=1;
 h.has_next(result: r);
 if r ne 0 then count+1;
 if count>1 then NAME=catt(NAME,ID);
 end;
 drop count r;
 run;

Haikuo
Onyx | Level 15

No need for a 'count' variable in Hash setting:

data want_h;
 if _n_=1 then do;
   declare hash h(dataset:'have(keep=name)', multidata:'y');
   h.definekey('name');
   h.definedata(all:'y');
   h.definedone();
   end;

   set have;
   h.find(); 
   h.has_next(result:_n_); 
   if _n_ ne 0 then name=cats(name,id);

run; 
novinosrin
Tourmaline | Level 20

@Haikuo Right, Thank you and a neat catch. Can't believe that didn't cross my mind. I always lack attention to details.  😞

nickspencer
Obsidian | Level 7

Thank you. This is really helpful. But my requirement is changed a little bit. A Teradata table has the following which can be used as a reference.

 

ID      Name         State

1        John            CA

2        Amy             FL

3        Tim              MD

4        Seth             NM

5        John             VA

6         Amy             NM

 

So, if my dataset has any of the names which have occurred more than once in the above reference table. Then it should concatenate. It may not be repeating in the dataset.

 

ID Name

5   John

4   Seth

3   Tim

6   Amy

 

The output should be

 

ID Name

5   John (5)

4   Seth

3   Tim

6    Amy (6)

 

 

Thanks in Advance.

 

 

 

 

 

 

Jagadishkatam
Amethyst | Level 16
data have;
input ID      Name$         State$;
cards;
1        John            CA
2        Amy             FL
3        Tim              MD
4        Seth             NM
5        John             VA
;

proc sort data=have;
by name;
run;

data want;
do until(last.name);
set have;
by name;
retain cnt;
if first.name then cnt=1;
else cnt+1;
end;
do until(last.name);
set have;
by name;
if cnt>1 then name=cats(name,put(id,best.));
output;
end;
run; 
Thanks,
Jag

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 2223 views
  • 5 likes
  • 7 in conversation