Hi All,
Can you please suggest on this,
1. I have name table and need to clean the data according to
A). Where ever same alphabets are repeated more than 2 times need to fix 2 times only.
Ex: - HAVE WANT
------------- -----------
POOOR POOR
POOOOR POOR
BAAAB BAAB
YAHOOOO YAHOO
B). Where ever word is repeated more than one time need to delete seceond occerence.
EX:- Name1 Name2 Name3
--------- ---------- ----------
1. JAK LAK JAK - Here JAK repeate more than one time need to delete second JAK.
2. TOM TOM MIKE - Here TOM repeate more than one time need to delete second TOM.
Thanks in advance......!
For A)
data have;
input var1 :$20.;
var2=prxchange('s/([[:alpha:]])(\1+)/\1\1/oi',-1,var1);
datalines;
POOOR
POOOOR
BAAAB
YAHOoOO
YAAHOOOO
YAAHOOOOOOOOO111
;
run;
For B)
data have;
input (Name1 Name2 Name3 ) ($);
cards;
JAK LAK JAK
TOM TOM MIKE
;
run;
data want(drop=_:);
set have;
if _n_=1 then
do;
length _name $8.;
dcl hash h(hashexp:2);
_rc=h.defineKey('_Name');
_rc=h.defineDone();
end;
array names name1 - name3;
do over names;
_name=names;
if h.check()=0 then call missing(names);
else h.add();
end;
_rc=h.clear();
run;
For B) transpose your data and do a PROC SORT NODUPKEY.
Hi, I trust you were able to do B) having received the help from
. If not let me know, I am happy to help. But if you did, good for you.
For your A.) here is the answer:
data have;
input var1 $;
datalines;
POOOR
POOOOR
BAAAB
YAHOOOO
;
data flip;
set have;
group+1;
do i=1 to length(var1);
new_var=substr(var1,i,1);
if new_var eq lag(new_var) then count+1;
else count=1;
if count>2 then continue;
output;
end;
call missing(count);
run;
data want;
array v(10) $1;
call missing (of v{*});
c=0;
do until(last.group);
set flip;
by group;
c+1;
v(c)=new_var;
new=cats(of v
HTH,
Naveen Srinivasan
L&T Infotech
Hi naveen_srini
Thanks a lot and i got for B).
and Without keep and drop statements output is like this from your quirey,
v1 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9 | v10 | c | num | var1 | group | i | new_var | count | new |
P | O | O | R | 4 | 1 | POOOR | 1 | 5 | R | 1 | POOR | ||||||
P | O | O | R | 4 | 2 | POOOOR | 2 | 6 | R | 1 | POOR | ||||||
B | A | A | B | 4 | 3 | BAAAB | 3 | 5 | B | 1 | BAAB | ||||||
Y | A | H | O | O | 5 | 4 | YAHOOOO | 4 | 5 | O | 2 | YAHOO |
But here I have Multiple Name fields along with hundreds of other fields, and we dont want change or drop the other fields.
original data set having like this
Name1 | Name2 | Name3 | Name4 | Name5 | var1-Var50 |
POOOR | ABCD | ||||
LMN | POOOOR | ||||
BAAAB | |||||
XYZ | K | L | YAHOOOO |
My request is where ever same alphabet is repeated more than two times need to fix with.
Thank You.
I would suggest your best bet is to look at two techs:
a) arrays, in your instance above you want to do a method on all variables name{x} so you can do a loop:
data test;
set have;
array name{5}; /* this setups the array to loop over all variables name1-5 */
do i=1 to 5;
...method;
end;
run;
b) Perl regular expressions: http://www2.sas.com/proceedings/sugi29/265-29.pdf
This can be used to find text patterns and replace them. An example of characters: perl - What regex can match sequences of the same character? - Stack Overflow
A combination the above two should be able to resolve your issue.
A)
data have; input var1 $; var2=prxchange('s/(A)A|(B)B|(C)C|(D)D|(E)E|(F)F|(G)G|(H)H|(I)I|(J)J|(K)K|(L)L|(M)M|(N)N|(O)O|(P)P|(Q)Q|(R)R|(S)S|(T)T|(U)U|(V)V|(W)W|(X)X|(Y)Y|(Z)Z/$1/i',1,var1); datalines; POOOR POOOOR BAAAB YAHOOOO ; run;
B)
data x; input (Name1 Name2 Name3 ) ($); cards; JAK LAK JAK TOM TOM MIKE ; run; data w; set x; array na{*} $ name: ; array x{999} $ 100 _temporary_; n=0;call missing(of x{*}); do i=1 to dim(na); if na{i} not in x then do;n+1;x{n}=na{i};end; else call missing(na{i}); end; drop n i j; run;
Xia Keshan
For A)
data have;
input var1 :$20.;
var2=prxchange('s/([[:alpha:]])(\1+)/\1\1/oi',-1,var1);
datalines;
POOOR
POOOOR
BAAAB
YAHOoOO
YAAHOOOO
YAAHOOOOOOOOO111
;
run;
For B)
data have;
input (Name1 Name2 Name3 ) ($);
cards;
JAK LAK JAK
TOM TOM MIKE
;
run;
data want(drop=_:);
set have;
if _n_=1 then
do;
length _name $8.;
dcl hash h(hashexp:2);
_rc=h.defineKey('_Name');
_rc=h.defineDone();
end;
array names name1 - name3;
do over names;
_name=names;
if h.check()=0 then call missing(names);
else h.add();
end;
_rc=h.clear();
run;
Learn something new.
Thank you everyone for immense support..!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.