BookmarkSubscribeRSS Feed
narnia649
Obsidian | Level 7

I have a dataset with a variable name.

I would like to replace the repeating values that are concatenated twice or more with a '-' as the delimiter.

 

An example dataset below, shows that the number of elements concatenated change depending on the row. Also, not all of them have two repeating elements just the first row where "AAA" is repeated twice & the last row where "AAA" is repeated twice.

 

Dataset:

DATA test;
	INPUT name $;
	DATALINES;
AAA-AAA-A
AAA-BBB-B
AAA-AAA
;
RUN;

Desirable Output Table:

AAA-A

AAA-BBB-B

AAA

 

Is there a way to do this in SAS?

    I started with a do loop & the scan() function, but I wasn't sure how to determine the number of elements to loop to. The first two row has three strings concatenated but the last one has two strings concatenated.

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Try this

 

DATA test;
   INPUT name :$10.;
   DATALINES;
AAA-AAA-A
AAA-BBB-B
AAA-AAA
;
RUN;

data want(keep = newname);
   set test;
   newname=scan(name, 1, '-');
   do i=2 to countw(name,'-');
      word=scan(name, i, '-');
      found=findw(newname, word, 'it');
      if found=0 then newname=catx('-', newname, word);
   end;
run;
FreelanceReinh
Jade | Level 19

Nice algorithm, @PeterClemmensen!

 

Obviously you meant the FINDW argument 'it' as modifiers -- as in

found=findw(newname, word, '-', 'it');

--, not as delimiters (which they would be in the third argument).

Patrick
Opal | Level 21

Something like below should allow for some variation in your data.

DATA have;
	INPUT name :$100.;
	DATALINES;
AAA-AAA-A
AAA-AAA-A-AA-AAA
AAA-BBB-B
AAA-AAA
AAA-BBB-AAA-AAA-CCC-BBB
AAA--AAA-AAA-CCC-BBB

;

data want(drop=_:);
  if _n_=1 then 
    do;
      length _term $40;
      dcl hash h1(hashexp:3);
      h1.defineKey('_term');
      h1.defineDone();
    end;

  set have;
  if 0 then name2=name;
  _stop=countw(name,'-');
  do _i=1 to _stop;
    _term=scan(name,_i,'-');
    if h1.check() ne 0 then 
      do;
        name2=catx('-',name2,_term);
        h1.ref();
      end;
  end;
  h1.clear();
run;

Patrick_0-1632473116057.png

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 918 views
  • 5 likes
  • 4 in conversation