I have a dataset with a variable name.
I would like to replace the repeating values that are concatenated twice or more with a '-' as the delimiter.
An example dataset below, shows that the number of elements concatenated change depending on the row. Also, not all of them have two repeating elements just the first row where "AAA" is repeated twice & the last row where "AAA" is repeated twice.
Dataset:
DATA test;
INPUT name $;
DATALINES;
AAA-AAA-A
AAA-BBB-B
AAA-AAA
;
RUN;
Desirable Output Table:
AAA-A
AAA-BBB-B
AAA
Is there a way to do this in SAS?
I started with a do loop & the scan() function, but I wasn't sure how to determine the number of elements to loop to. The first two row has three strings concatenated but the last one has two strings concatenated.
Try this
DATA test;
INPUT name :$10.;
DATALINES;
AAA-AAA-A
AAA-BBB-B
AAA-AAA
;
RUN;
data want(keep = newname);
set test;
newname=scan(name, 1, '-');
do i=2 to countw(name,'-');
word=scan(name, i, '-');
found=findw(newname, word, 'it');
if found=0 then newname=catx('-', newname, word);
end;
run;
Nice algorithm, @PeterClemmensen!
Obviously you meant the FINDW argument 'it' as modifiers -- as in
found=findw(newname, word, '-', 'it');
--, not as delimiters (which they would be in the third argument).
Something like below should allow for some variation in your data.
DATA have;
INPUT name :$100.;
DATALINES;
AAA-AAA-A
AAA-AAA-A-AA-AAA
AAA-BBB-B
AAA-AAA
AAA-BBB-AAA-AAA-CCC-BBB
AAA--AAA-AAA-CCC-BBB
;
data want(drop=_:);
if _n_=1 then
do;
length _term $40;
dcl hash h1(hashexp:3);
h1.defineKey('_term');
h1.defineDone();
end;
set have;
if 0 then name2=name;
_stop=countw(name,'-');
do _i=1 to _stop;
_term=scan(name,_i,'-');
if h1.check() ne 0 then
do;
name2=catx('-',name2,_term);
h1.ref();
end;
end;
h1.clear();
run;
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.