BookmarkSubscribeRSS Feed
narnia649
Obsidian | Level 7

I have a dataset with a variable name.

I would like to replace the repeating values that are concatenated twice or more with a '-' as the delimiter.

 

An example dataset below, shows that the number of elements concatenated change depending on the row. Also, not all of them have two repeating elements just the first row where "AAA" is repeated twice & the last row where "AAA" is repeated twice.

 

Dataset:

DATA test;
	INPUT name $;
	DATALINES;
AAA-AAA-A
AAA-BBB-B
AAA-AAA
;
RUN;

Desirable Output Table:

AAA-A

AAA-BBB-B

AAA

 

Is there a way to do this in SAS?

    I started with a do loop & the scan() function, but I wasn't sure how to determine the number of elements to loop to. The first two row has three strings concatenated but the last one has two strings concatenated.

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Try this

 

DATA test;
   INPUT name :$10.;
   DATALINES;
AAA-AAA-A
AAA-BBB-B
AAA-AAA
;
RUN;

data want(keep = newname);
   set test;
   newname=scan(name, 1, '-');
   do i=2 to countw(name,'-');
      word=scan(name, i, '-');
      found=findw(newname, word, 'it');
      if found=0 then newname=catx('-', newname, word);
   end;
run;
FreelanceReinh
Jade | Level 19

Nice algorithm, @PeterClemmensen!

 

Obviously you meant the FINDW argument 'it' as modifiers -- as in

found=findw(newname, word, '-', 'it');

--, not as delimiters (which they would be in the third argument).

Patrick
Opal | Level 21

Something like below should allow for some variation in your data.

DATA have;
	INPUT name :$100.;
	DATALINES;
AAA-AAA-A
AAA-AAA-A-AA-AAA
AAA-BBB-B
AAA-AAA
AAA-BBB-AAA-AAA-CCC-BBB
AAA--AAA-AAA-CCC-BBB

;

data want(drop=_:);
  if _n_=1 then 
    do;
      length _term $40;
      dcl hash h1(hashexp:3);
      h1.defineKey('_term');
      h1.defineDone();
    end;

  set have;
  if 0 then name2=name;
  _stop=countw(name,'-');
  do _i=1 to _stop;
    _term=scan(name,_i,'-');
    if h1.check() ne 0 then 
      do;
        name2=catx('-',name2,_term);
        h1.ref();
      end;
  end;
  h1.clear();
run;

Patrick_0-1632473116057.png

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 469 views
  • 5 likes
  • 4 in conversation