BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas_lak
Quartz | Level 8

Hi All,

Can you please suggest on this,

1. I have name table and need to clean the data according to

A). Where ever same alphabets are repeated more than 2 times need to fix 2 times only.

   

Ex: - HAVE            WANT

        -------------      -----------

        POOOR         POOR

        POOOOR      POOR

        BAAAB          BAAB

        YAHOOOO   YAHOO

B). Where ever word is repeated more than one time need to delete seceond occerence.

EX:- Name1      Name2     Name3   

        ---------       ----------    ----------   

   1.  JAK           LAK          JAK             - Here JAK repeate more than one time need to delete second JAK.

   2.  TOM         TOM         MIKE           - Here TOM repeate more than one time  need to delete second TOM.

Thanks in advance......!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

For A)

data have;

  input var1 :$20.;

  var2=prxchange('s/([[:alpha:]])(\1+)/\1\1/oi',-1,var1);

  datalines;

POOOR       

POOOOR    

BAAAB       

YAHOoOO

YAAHOOOO

YAAHOOOOOOOOO111

;

run;

For B)

data have;

  input (Name1 Name2 Name3 ) ($);

  cards;

JAK LAK JAK

TOM TOM MIKE

;

run;

data want(drop=_:);

  set have;

  if _n_=1 then

    do;

      length _name $8.;

      dcl hash h(hashexp:2);

      _rc=h.defineKey('_Name');

      _rc=h.defineDone();

    end;

  array names name1 - name3; 

  do over names;

    _name=names;

    if h.check()=0 then call missing(names);

    else h.add();

  end;

  _rc=h.clear();

run;

View solution in original post

8 REPLIES 8
LinusH
Tourmaline | Level 20

For B) transpose your data and do a PROC SORT NODUPKEY.

Data never sleeps
naveen_srini
Quartz | Level 8

Hi, I trust you were able to do B) having received the help from

Master

. If not let me know, I am happy to help. But if you did, good for you.

For your A.) here is the answer:

data have;
input var1 $;
datalines;
POOOR        
POOOOR     
BAAAB        
YAHOOOO
;

data flip;
set have;
group+1;
do i=1 to length(var1);
new_var=substr(var1,i,1);
if new_var eq lag(new_var) then count+1;
else count=1;
if count>2 then continue;
output;
end;
call missing(count);
run;

data want;
array v(10) $1;
call missing (of v{*});
c=0;
do until(last.group);
set flip;
by group;
c+1;
v(c)=new_var;
new=cats(of v

  • );
    end;
    keep new;
    run;
  • HTH,

    Naveen Srinivasan

    L&T Infotech

    sas_lak
    Quartz | Level 8

    Hi

    Thanks a lot and i got for B).

    and Without keep and drop statements output is like this from your quirey,

    v1v2v3v4v5v6v7v8v9v10cnumvar1groupinew_varcountnew
    POOR 41POOOR15R1POOR
    POOR 42POOOOR26R1POOR
    BAAB 43BAAAB35B1BAAB
    YAHOO 54YAHOOOO45O2YAHOO

    But here I have Multiple Name fields along with hundreds of other fields, and we dont want change or drop the other fields.

    original data set having like this

    Name1Name2Name3Name4Name5var1-Var50
    POOORABCD
    LMNPOOOOR
    BAAAB
    XYZKLYAHOOOO

    My request is where ever same alphabet is repeated more than two times need to fix with.

    Thank You.

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    I would suggest your best bet is to look at two techs:

    a) arrays, in your instance above you want to do a method on all variables name{x} so you can do a loop:

    data test;

         set have;

         array name{5};  /* this setups the array to loop over all variables name1-5 */

         do i=1 to 5;

              ...method;

         end;

    run;

    b) Perl regular expressions: http://www2.sas.com/proceedings/sugi29/265-29.pdf

    This can be used to find text patterns and replace them.  An example of characters: perl - What regex can match sequences of the same character? - Stack Overflow 

    A combination the above two should be able to resolve your issue.

    Ksharp
    Super User

    A)

    data have;
    input var1 $;
    var2=prxchange('s/(A)A|(B)B|(C)C|(D)D|(E)E|(F)F|(G)G|(H)H|(I)I|(J)J|(K)K|(L)L|(M)M|(N)N|(O)O|(P)P|(Q)Q|(R)R|(S)S|(T)T|(U)U|(V)V|(W)W|(X)X|(Y)Y|(Z)Z/$1/i',1,var1);
    datalines;
    POOOR         
    POOOOR      
    BAAAB         
    YAHOOOO
    ;
    run;
    

    B)

    data x;
    input (Name1      Name2     Name3 ) ($);  
    cards;
    JAK           LAK          JAK  
    TOM         TOM         MIKE  
    ;
    run;
    data w;
     set x;
     array na{*} $ name:     ;
     array x{999} $ 100 _temporary_;
     n=0;call missing(of x{*});
     do i=1 to dim(na);
      if na{i} not in x then do;n+1;x{n}=na{i};end;
       else call missing(na{i});
     end;
     drop n i j;
    run;
    

    Xia Keshan

    Patrick
    Opal | Level 21

    For A)

    data have;

      input var1 :$20.;

      var2=prxchange('s/([[:alpha:]])(\1+)/\1\1/oi',-1,var1);

      datalines;

    POOOR       

    POOOOR    

    BAAAB       

    YAHOoOO

    YAAHOOOO

    YAAHOOOOOOOOO111

    ;

    run;

    For B)

    data have;

      input (Name1 Name2 Name3 ) ($);

      cards;

    JAK LAK JAK

    TOM TOM MIKE

    ;

    run;

    data want(drop=_:);

      set have;

      if _n_=1 then

        do;

          length _name $8.;

          dcl hash h(hashexp:2);

          _rc=h.defineKey('_Name');

          _rc=h.defineDone();

        end;

      array names name1 - name3; 

      do over names;

        _name=names;

        if h.check()=0 then call missing(names);

        else h.add();

      end;

      _rc=h.clear();

    run;

    Ksharp
    Super User

    Learn something new.

    sas_lak
    Quartz | Level 8

    Thank you everyone for immense support..! Smiley Happy

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 8 replies
    • 1619 views
    • 13 likes
    • 6 in conversation