BookmarkSubscribeRSS Feed
R_Win
Calcite | Level 5

HI i am having data in some obs the data will be repetative i want to supress that records how can i do i.i dont know the length as it may be 3 or 100 how can i do.I want to supress them ex data Test; input id$ 1-100; cards; AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE RUN; OUTPUT SHOULD BE: RTYUY QWEPO KOOL GONE

11 REPLIES 11
R_Win
Calcite | Level 5

HI i am having data in some obs the data will be repetative i want to supress that records how can i do i.i dont know the length as it may be 3 or 100 how can i do.I want to supress them ex data Test; input id$ 1-100; cards; AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE RUN ; OUTPUT SHOULD BE : RTYUY QWEPO KOOL GONE CAN REFER THE TEXT

ballardw
Super User

Are the unwanted strings of repeated characters ALWAYS separated by spaces?

Will the repeated characters ALWAYS be the same character within a repeat group? (Will never have a group like AAAAAZZZZZ that is unwanted.)

data_null__
Jade | Level 19

if missing(compress(id,first(id))) then delete;

R_Win
Calcite | Level 5

thqs it worked

Haikuo
Onyx | Level 15

Try this:

data Test;

input id:$100.;

cards;

AAA

VVVVVVVVVVVVVVVVVVVVVV

EEEEEEEEEEEEEEEEEEEEEEEEEE

RTYUY

QWEPO

ZZZZZZZZZZZZZ

KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK

KOOL

GONE

;

data want;

set test;

if lengthn(compress(id,first(id)))=0 then delete;

run;

proc print;RUN;

Haikuo

R_Win
Calcite | Level 5

Thqs its working

ballardw
Super User

Your example input does not make the string long enough to read the example data.

PGStats
Opal | Level 21

How about :

data Test;

  input;

  s = compbl(prxchange("s/\b(\w)\1{2,}\b//o", -1, _infile_));

  put s;

datalines;

AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE

;

It removes any mono-character word of length 3 or more. You could use the pattern "s/\b([[:alpha:]])\1{2,}\b//o" to remove only alphabetic mono character words.

PG

PG
Astounding
PROC Star

More questions.

Do you impose a minimum length to suppress?  (Could a string only 2 characters long be suppressed?)

Do you maintain a list of exceptions?  In your example, AAA might be a legitimate string for some applications.

Is each word on a separate line, or are multiple words on the same line of data?

ballardw
Super User

This is a brute force method that works for your example. Caveats: Single characters will be eliminated. A check for length could be added to the execute the line with compress only if length is greater than minimum acceptable duplication. Also, case is not taken into account. If AaA is supposed to be removed it won't unless UPCASE is applied.

The array size is arbitrary but needs to 1) have enough elements to catch all of your repeat strings, 2) each element needs to long enough to contain the longest repeat string.

 

data Test;

input id$ 1-127;

array t {100} $ 100 _t1 -_t100;

do i=1 to (countw(ID));

t= scan(id,i);

if compress(t,first(t)) = '' then t=compress(t,first(t));

end;

outstr = catx(' ', of _t1 - _t100); /* this is the hopefully desired output string*/

drop _t: i;

cards;

AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE

;

RUN ;

tfrerichs
Calcite | Level 5

I would use Perl Regular Expressions and do it like this:

This code will eliminate characters a-z and A-Z (ASCII codes 65 to 90) (others can be added of course!) if they appear at least two times in sequence and are surrounded by so called word boundaries.

data Test;

  infile cards;

  input;

  s = _infile_;

  do i=65 to 90;

    s = prxchange(cats("s/\b", byte(i),"{2,}\b//i"), -1, s);

  end;

  s = strip(compbl(s));

cards;

AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE RUN

;

RUN ;

Kind Regards

Thomas

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1538 views
  • 0 likes
  • 7 in conversation