Hi All,
Can you please suggest on this,
1. I have name table and need to clean the data according to
A). Where ever same alphabets are repeated more than 2 times need to fix 2 times only.
Ex: - HAVE WANT
------------- -----------
POOOR POOR
POOOOR POOR
BAAAB BAAB
YAHOOOO YAHOO
B). Where ever word is repeated more than one time need to delete seceond occerence.
EX:- Name1 Name2 Name3
--------- ---------- ----------
1. JAK LAK JAK - Here JAK repeate more than one time need to delete second JAK.
2. TOM TOM MIKE - Here TOM repeate more than one time need to delete second TOM.
Thanks in advance......!
For A)
data have;
input var1 :$20.;
var2=prxchange('s/([[:alpha:]])(\1+)/\1\1/oi',-1,var1);
datalines;
POOOR
POOOOR
BAAAB
YAHOoOO
YAAHOOOO
YAAHOOOOOOOOO111
;
run;
For B)
data have;
input (Name1 Name2 Name3 ) ($);
cards;
JAK LAK JAK
TOM TOM MIKE
;
run;
data want(drop=_:);
set have;
if _n_=1 then
do;
length _name $8.;
dcl hash h(hashexp:2);
_rc=h.defineKey('_Name');
_rc=h.defineDone();
end;
array names name1 - name3;
do over names;
_name=names;
if h.check()=0 then call missing(names);
else h.add();
end;
_rc=h.clear();
run;
For B) transpose your data and do a PROC SORT NODUPKEY.
Hi, I trust you were able to do B) having received the help from
. If not let me know, I am happy to help. But if you did, good for you.
For your A.) here is the answer:
data have;
input var1 $;
datalines;
POOOR
POOOOR
BAAAB
YAHOOOO
;
data flip;
set have;
group+1;
do i=1 to length(var1);
new_var=substr(var1,i,1);
if new_var eq lag(new_var) then count+1;
else count=1;
if count>2 then continue;
output;
end;
call missing(count);
run;
data want;
array v(10) $1;
call missing (of v{*});
c=0;
do until(last.group);
set flip;
by group;
c+1;
v(c)=new_var;
new=cats(of v
HTH,
Naveen Srinivasan
L&T Infotech
Hi naveen_srini
Thanks a lot and i got for B).
and Without keep and drop statements output is like this from your quirey,
v1 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9 | v10 | c | num | var1 | group | i | new_var | count | new |
P | O | O | R | 4 | 1 | POOOR | 1 | 5 | R | 1 | POOR | ||||||
P | O | O | R | 4 | 2 | POOOOR | 2 | 6 | R | 1 | POOR | ||||||
B | A | A | B | 4 | 3 | BAAAB | 3 | 5 | B | 1 | BAAB | ||||||
Y | A | H | O | O | 5 | 4 | YAHOOOO | 4 | 5 | O | 2 | YAHOO |
But here I have Multiple Name fields along with hundreds of other fields, and we dont want change or drop the other fields.
original data set having like this
Name1 | Name2 | Name3 | Name4 | Name5 | var1-Var50 |
POOOR | ABCD | ||||
LMN | POOOOR | ||||
BAAAB | |||||
XYZ | K | L | YAHOOOO |
My request is where ever same alphabet is repeated more than two times need to fix with.
Thank You.
I would suggest your best bet is to look at two techs:
a) arrays, in your instance above you want to do a method on all variables name{x} so you can do a loop:
data test;
set have;
array name{5}; /* this setups the array to loop over all variables name1-5 */
do i=1 to 5;
...method;
end;
run;
b) Perl regular expressions: http://www2.sas.com/proceedings/sugi29/265-29.pdf
This can be used to find text patterns and replace them. An example of characters: perl - What regex can match sequences of the same character? - Stack Overflow
A combination the above two should be able to resolve your issue.
A)
data have; input var1 $; var2=prxchange('s/(A)A|(B)B|(C)C|(D)D|(E)E|(F)F|(G)G|(H)H|(I)I|(J)J|(K)K|(L)L|(M)M|(N)N|(O)O|(P)P|(Q)Q|(R)R|(S)S|(T)T|(U)U|(V)V|(W)W|(X)X|(Y)Y|(Z)Z/$1/i',1,var1); datalines; POOOR POOOOR BAAAB YAHOOOO ; run;
B)
data x; input (Name1 Name2 Name3 ) ($); cards; JAK LAK JAK TOM TOM MIKE ; run; data w; set x; array na{*} $ name: ; array x{999} $ 100 _temporary_; n=0;call missing(of x{*}); do i=1 to dim(na); if na{i} not in x then do;n+1;x{n}=na{i};end; else call missing(na{i}); end; drop n i j; run;
Xia Keshan
For A)
data have;
input var1 :$20.;
var2=prxchange('s/([[:alpha:]])(\1+)/\1\1/oi',-1,var1);
datalines;
POOOR
POOOOR
BAAAB
YAHOoOO
YAAHOOOO
YAAHOOOOOOOOO111
;
run;
For B)
data have;
input (Name1 Name2 Name3 ) ($);
cards;
JAK LAK JAK
TOM TOM MIKE
;
run;
data want(drop=_:);
set have;
if _n_=1 then
do;
length _name $8.;
dcl hash h(hashexp:2);
_rc=h.defineKey('_Name');
_rc=h.defineDone();
end;
array names name1 - name3;
do over names;
_name=names;
if h.check()=0 then call missing(names);
else h.add();
end;
_rc=h.clear();
run;
Learn something new.
Thank you everyone for immense support..!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.