Hello Everyone,
PFB input and desired o/p data where the repeated/duplicate alphabet gets removed. let's suppose there is huge data then how can we do? for few records it can be done via tranwrd but how for huge data. ?? Thanks in advance !
input data | |
Field_Id | Name |
1 | Josse |
2 | Aangeyy |
3 | Suusa |
4 | Henryy |
5 | Johnn |
6 | Joose |
Output data | |
Field_Id | Name |
1 | Jose |
2 | Angey |
3 | Susa |
4 | Henry |
5 | John |
6 | Jose |
What to do when there are legitimate repeated letters in a name, such as the name Matthew?
The whole task looks like homework, because in a real-world-scenario names exist that have repeated chars, like Otto or Lynn.
A regular expression in prxchange can solve the problem:
data want;
set have;
Name = prxchange('s/(.)\1+/$1/', -1, Name);
run;
Maybe the expressions needs to be explained 😉
(.) matches one char, the parentheses create a capture group.
\1+ matches the the first capture group
$1 return the first capture group - one of the duplicated chars
-1 repeat changing until no more match is found
Hi @andreas_lds
It doesn't give the correct output. Didn't you mean something like this?
data want;
set have;
*search string in lowcase character and then do pattern matching and finally convert it into propcase;
Name = propcase(prxchange('s/(.)\1+/$1/', -1, lowcase(Name)));
run;
@hhinohar: Seems that i forgot to add the "i" option to prxchange:
data want;
set have;
Name = prxchange('s/(.)\1+/$1/i', -1, Name);
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.