BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_AMUH
Obsidian | Level 7

I've learnt about the nodupkey function and it's very useful in removing duplicated observations.

But, is there any way I could overwrite or remove duplicates within the column?

I'm unsure if I have phrased my question correctly, so please see the example below. 

Any help would be greatly appreciated. 

 

<ORIGINAL>

ID Pattern

1 AABCCA -> ABCA

 2 ABBCC -> ABC

3 ABBBCDC -> ABCDC

4 DDCBA -> DCBA

5 CCC -> C

 

<WANT>

ID Pattern

1 ABCA

2 ABC

3 ABCDC

4 DCBA

5 C

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

data WANT;
  set HAVE;
  PATTERN1=prxchange('s/(.)(?=\1)/\2/',-1,PATTERN);
run;

 

 
PATTERN PATTERN1
AABCCA ABCA
ABBCC ABC
ABBBCDC ABCDC
DDCBA DCBA
CCC C

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

Hard to tell what you mean.  It looks like you mean that PATTERN is a character variable that has strings that look like "AABCCA -> ABCA" and you want to just keep the part after the last greater than sign.

data want;
  set have ;
  if index(pattern,'>') then pattern=scan(pattern,-1,'>');
  else do;
/* not sure what you want to do here 
  since all of your examples have the > sign
*/
  end;
run;
SAS_AMUH
Obsidian | Level 7

Thank you Tom. 

I'm very sorry, I totally forgot to remove those -> arrow signs.

They are not in the actual data. 

 

<ORIGINAL>

ID Pattern

1 AABCCA 

 2 ABBCC 

3 ABBBCDC 

4 DDCBA 

5 CCC 

 

<WANT>

ID Pattern

1 ABCA

2 ABC

3 ABCDC

4 DCBA

5 C

ChrisNZ
Tourmaline | Level 20

Like this?

data WANT;
  set HAVE;
  PATTERN1=prxchange('s/(.)(?=\1)/\2/',-1,PATTERN);
run;

 

 
PATTERN PATTERN1
AABCCA ABCA
ABBCC ABC
ABBBCDC ABCDC
DDCBA DCBA
CCC C
SAS_AMUH
Obsidian | Level 7

Thank you so much. 

I would really appreciate it if you could please briefly tell me what each coding means.

I need to modify it to suit to my other similar data and I'd like to learn about this coding.

However, my SAS level is limited and I don't know what keywords I could use to look up in a help site like this or the guide books. 

ChrisNZ
Tourmaline | Level 20

This syntax (and the syntax in @Ksharp 's answer) uses regular expressions. It is not special to SAS. In fact it started with the PERL language if I am not mistaken.

There are many tutorials on the web. That's how I learnt.

I'll detail @Ksharp 's expression as I prefer it to mine. 

 

 

s          enter substitution mode
/          start of matching expression
(\w)       \w means: match any word character, brackets mean: capture the match 
\1+        \1 means: match the previous capture group, + means: at least once
/          end of matching expression, start of replacing expression
\1         replace with first capture group
/          end of expression

 

 

 

So we match one character (and capture it) and match more of the same character.

If this was found, we replace with the matched character.

ChrisNZ
Tourmaline | Level 20

@Kurt_Bremser

Do you know if SAS's first attempt at RegEx (the no longer documented or supported but still available RX functions) was using the posix syntax?

ChrisNZ
Tourmaline | Level 20

Mmm interesting . Who knows where they got the SAS patterns from....

Thank you.

Ksharp
Super User
data have;
input ID Pattern $;
cards;
1 AABCCA 
2 ABBCC 
3 ABBBCDC 
4 DDCBA 
5 CCC 
;
data want;
 set have;
 want=prxchange('s/(\w)\1+/\1/',-1,Pattern);
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1591 views
  • 10 likes
  • 5 in conversation