BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_AMUH
Obsidian | Level 7

I've learnt about the nodupkey function and it's very useful in removing duplicated observations.

But, is there any way I could overwrite or remove duplicates within the column?

I'm unsure if I have phrased my question correctly, so please see the example below. 

Any help would be greatly appreciated. 

 

<ORIGINAL>

ID Pattern

1 AABCCA -> ABCA

 2 ABBCC -> ABC

3 ABBBCDC -> ABCDC

4 DDCBA -> DCBA

5 CCC -> C

 

<WANT>

ID Pattern

1 ABCA

2 ABC

3 ABCDC

4 DCBA

5 C

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

data WANT;
  set HAVE;
  PATTERN1=prxchange('s/(.)(?=\1)/\2/',-1,PATTERN);
run;

 

 
PATTERN PATTERN1
AABCCA ABCA
ABBCC ABC
ABBBCDC ABCDC
DDCBA DCBA
CCC C

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

Hard to tell what you mean.  It looks like you mean that PATTERN is a character variable that has strings that look like "AABCCA -> ABCA" and you want to just keep the part after the last greater than sign.

data want;
  set have ;
  if index(pattern,'>') then pattern=scan(pattern,-1,'>');
  else do;
/* not sure what you want to do here 
  since all of your examples have the > sign
*/
  end;
run;
SAS_AMUH
Obsidian | Level 7

Thank you Tom. 

I'm very sorry, I totally forgot to remove those -> arrow signs.

They are not in the actual data. 

 

<ORIGINAL>

ID Pattern

1 AABCCA 

 2 ABBCC 

3 ABBBCDC 

4 DDCBA 

5 CCC 

 

<WANT>

ID Pattern

1 ABCA

2 ABC

3 ABCDC

4 DCBA

5 C

ChrisNZ
Tourmaline | Level 20

Like this?

data WANT;
  set HAVE;
  PATTERN1=prxchange('s/(.)(?=\1)/\2/',-1,PATTERN);
run;

 

 
PATTERN PATTERN1
AABCCA ABCA
ABBCC ABC
ABBBCDC ABCDC
DDCBA DCBA
CCC C
SAS_AMUH
Obsidian | Level 7

Thank you so much. 

I would really appreciate it if you could please briefly tell me what each coding means.

I need to modify it to suit to my other similar data and I'd like to learn about this coding.

However, my SAS level is limited and I don't know what keywords I could use to look up in a help site like this or the guide books. 

ChrisNZ
Tourmaline | Level 20

This syntax (and the syntax in @Ksharp 's answer) uses regular expressions. It is not special to SAS. In fact it started with the PERL language if I am not mistaken.

There are many tutorials on the web. That's how I learnt.

I'll detail @Ksharp 's expression as I prefer it to mine. 

 

 

s          enter substitution mode
/          start of matching expression
(\w)       \w means: match any word character, brackets mean: capture the match 
\1+        \1 means: match the previous capture group, + means: at least once
/          end of matching expression, start of replacing expression
\1         replace with first capture group
/          end of expression

 

 

 

So we match one character (and capture it) and match more of the same character.

If this was found, we replace with the matched character.

ChrisNZ
Tourmaline | Level 20

@Kurt_Bremser

Do you know if SAS's first attempt at RegEx (the no longer documented or supported but still available RX functions) was using the posix syntax?

ChrisNZ
Tourmaline | Level 20

Mmm interesting . Who knows where they got the SAS patterns from....

Thank you.

Ksharp
Super User
data have;
input ID Pattern $;
cards;
1 AABCCA 
2 ABBCC 
3 ABBBCDC 
4 DDCBA 
5 CCC 
;
data want;
 set have;
 want=prxchange('s/(\w)\1+/\1/',-1,Pattern);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 942 views
  • 10 likes
  • 5 in conversation