DATA Step, Macro, Functions and more

NAME SALUTATIONS.....! (Cleaning)

Accepted Solution Solved
Reply
Contributor
Posts: 46
Accepted Solution

NAME SALUTATIONS.....! (Cleaning)

Hi All,

Can you please suggest on this,

1. I have name table and need to clean the data according to

A). Where ever same alphabets are repeated more than 2 times need to fix 2 times only.

   

Ex: - HAVE            WANT

        -------------      -----------

        POOOR         POOR

        POOOOR      POOR

        BAAAB          BAAB

        YAHOOOO   YAHOO

B). Where ever word is repeated more than one time need to delete seceond occerence.

EX:- Name1      Name2     Name3   

        ---------       ----------    ----------   

   1.  JAK           LAK          JAK             - Here JAK repeate more than one time need to delete second JAK.

   2.  TOM         TOM         MIKE           - Here TOM repeate more than one time  need to delete second TOM.

Thanks in advance......!


Accepted Solutions
Solution
‎03-11-2015 09:57 AM
Respected Advisor
Posts: 3,887

Re: NAME SALUTATIONS.....! (Cleaning)

For A)

data have;

  input var1 :$20.;

  var2=prxchange('s/([[:alpha:]])(\1+)/\1\1/oi',-1,var1);

  datalines;

POOOR       

POOOOR    

BAAAB       

YAHOoOO

YAAHOOOO

YAAHOOOOOOOOO111

;

run;

For B)

data have;

  input (Name1 Name2 Name3 ) ($);

  cards;

JAK LAK JAK

TOM TOM MIKE

;

run;

data want(drop=_Smiley Happy;

  set have;

  if _n_=1 then

    do;

      length _name $8.;

      dcl hash h(hashexp:2);

      _rc=h.defineKey('_Name');

      _rc=h.defineDone();

    end;

  array names name1 - name3; 

  do over names;

    _name=names;

    if h.check()=0 then call missing(names);

    else h.add();

  end;

  _rc=h.clear();

run;

View solution in original post


All Replies
Super User
Posts: 5,256

Re: NAME SALUTATIONS.....! (Cleaning)

For B) transpose your data and do a PROC SORT NODUPKEY.

Data never sleeps
Frequent Contributor
Posts: 115

Re: NAME SALUTATIONS.....! (Cleaning)

Hi, I trust you were able to do B) having received the help from

Master

. If not let me know, I am happy to help. But if you did, good for you.

For your A.) here is the answer:

data have;
input var1 $;
datalines;
POOOR        
POOOOR     
BAAAB        
YAHOOOO
;

data flip;
set have;
group+1;
do i=1 to length(var1);
new_var=substr(var1,i,1);
if new_var eq lag(new_var) then count+1;
else count=1;
if count>2 then continue;
output;
end;
call missing(count);
run;

data want;
array v(10) $1;
call missing (of v{*});
c=0;
do until(last.group);
set flip;
by group;
c+1;
v(c)=new_var;
new=cats(of v

  • );
    end;
    keep new;
    run;
  • HTH,

    Naveen Srinivasan

    L&T Infotech

    Contributor
    Posts: 46

    Re: NAME SALUTATIONS.....! (Cleaning)

    Hi

    Thanks a lot and i got for B).

    and Without keep and drop statements output is like this from your quirey,

    v1v2v3v4v5v6v7v8v9v10cnumvar1groupinew_varcountnew
    POOR 41POOOR15R1POOR
    POOR 42POOOOR26R1POOR
    BAAB 43BAAAB35B1BAAB
    YAHOO 54YAHOOOO45O2YAHOO

    But here I have Multiple Name fields along with hundreds of other fields, and we dont want change or drop the other fields.

    original data set having like this

    Name1Name2Name3Name4Name5var1-Var50
    POOORABCD
    LMNPOOOOR
    BAAAB
    XYZKLYAHOOOO

    My request is where ever same alphabet is repeated more than two times need to fix with.

    Thank You.

    Super User
    Super User
    Posts: 7,401

    Re: NAME SALUTATIONS.....! (Cleaning)

    I would suggest your best bet is to look at two techs:

    a) arrays, in your instance above you want to do a method on all variables name{x} so you can do a loop:

    data test;

         set have;

         array name{5};  /* this setups the array to loop over all variables name1-5 */

         do i=1 to 5;

              ...method;

         end;

    run;

    b) Perl regular expressions: http://www2.sas.com/proceedings/sugi29/265-29.pdf

    This can be used to find text patterns and replace them.  An example of characters: perl - What regex can match sequences of the same character? - Stack Overflow 

    A combination the above two should be able to resolve your issue.

    Super User
    Posts: 9,676

    Re: NAME SALUTATIONS.....! (Cleaning)

    A)

    data have;
    input var1 $;
    var2=prxchange('s/(A)A|(B)B|(C)C|(D)D|(E)E|(F)F|(G)G|(H)H|(I)I|(J)J|(K)K|(L)L|(M)M|(N)N|(O)O|(P)P|(Q)Q|(R)R|(S)S|(T)T|(U)U|(V)V|(W)W|(X)X|(Y)Y|(Z)Z/$1/i',1,var1);
    datalines;
    POOOR         
    POOOOR      
    BAAAB         
    YAHOOOO
    ;
    run;
    

    B)

    data x;
    input (Name1      Name2     Name3 ) ($);  
    cards;
    JAK           LAK          JAK  
    TOM         TOM         MIKE  
    ;
    run;
    data w;
     set x;
     array na{*} $ name:     ;
     array x{999} $ 100 _temporary_;
     n=0;call missing(of x{*});
     do i=1 to dim(na);
      if na{i} not in x then do;n+1;x{n}=na{i};end;
       else call missing(na{i});
     end;
     drop n i j;
    run;
    

    Xia Keshan

    Solution
    ‎03-11-2015 09:57 AM
    Respected Advisor
    Posts: 3,887

    Re: NAME SALUTATIONS.....! (Cleaning)

    For A)

    data have;

      input var1 :$20.;

      var2=prxchange('s/([[:alpha:]])(\1+)/\1\1/oi',-1,var1);

      datalines;

    POOOR       

    POOOOR    

    BAAAB       

    YAHOoOO

    YAAHOOOO

    YAAHOOOOOOOOO111

    ;

    run;

    For B)

    data have;

      input (Name1 Name2 Name3 ) ($);

      cards;

    JAK LAK JAK

    TOM TOM MIKE

    ;

    run;

    data want(drop=_Smiley Happy;

      set have;

      if _n_=1 then

        do;

          length _name $8.;

          dcl hash h(hashexp:2);

          _rc=h.defineKey('_Name');

          _rc=h.defineDone();

        end;

      array names name1 - name3; 

      do over names;

        _name=names;

        if h.check()=0 then call missing(names);

        else h.add();

      end;

      _rc=h.clear();

    run;

    Super User
    Posts: 9,676

    Re: NAME SALUTATIONS.....! (Cleaning)

    Learn something new.

    Contributor
    Posts: 46

    Re: NAME SALUTATIONS.....! (Cleaning)

    Thank you everyone for immense support..! Smiley Happy

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 8 replies
    • 420 views
    • 13 likes
    • 6 in conversation