HI i am having data in some obs the data will be repetative i want to supress that records how can i do i.i dont know the length as it may be 3 or 100 how can i do.I want to supress them ex data Test; input id$ 1-100; cards; AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE RUN; OUTPUT SHOULD BE: RTYUY QWEPO KOOL GONE
HI i am having data in some obs the data will be repetative i want to supress that records how can i do i.i dont know the length as it may be 3 or 100 how can i do.I want to supress them ex data Test; input id$ 1-100; cards; AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE RUN ; OUTPUT SHOULD BE : RTYUY QWEPO KOOL GONE CAN REFER THE TEXT
Are the unwanted strings of repeated characters ALWAYS separated by spaces?
Will the repeated characters ALWAYS be the same character within a repeat group? (Will never have a group like AAAAAZZZZZ that is unwanted.)
if missing(compress(id,first(id))) then delete;
thqs it worked
Try this:
data Test;
input id:$100.;
cards;
AAA
VVVVVVVVVVVVVVVVVVVVVV
EEEEEEEEEEEEEEEEEEEEEEEEEE
RTYUY
QWEPO
ZZZZZZZZZZZZZ
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KOOL
GONE
;
data want;
set test;
if lengthn(compress(id,first(id)))=0 then delete;
run;
proc print;RUN;
Haikuo
Thqs its working
Your example input does not make the string long enough to read the example data.
How about :
data Test;
input;
s = compbl(prxchange("s/\b(\w)\1{2,}\b//o", -1, _infile_));
put s;
datalines;
AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE
;
It removes any mono-character word of length 3 or more. You could use the pattern "s/\b([[:alpha:]])\1{2,}\b//o" to remove only alphabetic mono character words.
PG
More questions.
Do you impose a minimum length to suppress? (Could a string only 2 characters long be suppressed?)
Do you maintain a list of exceptions? In your example, AAA might be a legitimate string for some applications.
Is each word on a separate line, or are multiple words on the same line of data?
This is a brute force method that works for your example. Caveats: Single characters will be eliminated. A check for length could be added to the execute the line with compress only if length is greater than minimum acceptable duplication. Also, case is not taken into account. If AaA is supposed to be removed it won't unless UPCASE is applied.
The array size is arbitrary but needs to 1) have enough elements to catch all of your repeat strings, 2) each element needs to long enough to contain the longest repeat string.
data Test;
input id$ 1-127;
array t {100} $ 100 _t1 -_t100;
do i=1 to (countw(ID));
t= scan(id,i);
if compress(t,first(t)) = '' then t=compress(t,first(t));
end;
outstr = catx(' ', of _t1 - _t100); /* this is the hopefully desired output string*/
drop _t: i;
cards;
AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE
;
RUN ;
I would use Perl Regular Expressions and do it like this:
This code will eliminate characters a-z and A-Z (ASCII codes 65 to 90) (others can be added of course!) if they appear at least two times in sequence and are surrounded by so called word boundaries.
data Test;
infile cards;
input;
s = _infile_;
do i=65 to 90;
s = prxchange(cats("s/\b", byte(i),"{2,}\b//i"), -1, s);
end;
s = strip(compbl(s));
cards;
AAA VVVVVVVVVVVVVVVVVVVVVV EEEEEEEEEEEEEEEEEEEEEEEEEE RTYUY QWEPO ZZZZZZZZZZZZZ KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KOOL GONE RUN
;
RUN ;
Kind Regards
Thomas
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.