DATA Step, Macro, Functions and more

Concatenate strings in alphabetic order

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Concatenate strings in alphabetic order

My first column is name_1 and second column is name_2. All other columns contain variables relating to the combination of name_1 and name_2.

The order of name_1 and name_2 is arbitrary.

My dataset contains identical rows where only name_1 and name_2 are in reverse order. I need to delete these rows.

Is it possible to concatenate name_1 and name_2 in alphabetic order? In that case I could delete duplicated rows where only name_1 and name_2 are in reverse order.

Thanks for your advise,

Karen


Accepted Solutions
Solution
‎02-11-2013 03:45 PM
Respected Advisor
Posts: 3,124

Re: Concatenate strings in alphabetic order

Not sure if you can make use of something like this:

data have;

input (n1 n2) (:$1.);

cards;

a b

b a

;

data want;

  set have;

  array t n:;

call sortc(of t(*));

/*length cat $2.;

do over t;

  cat=cats(cat,t);

end;*/

run;

proc print;run;

Haikuo

View solution in original post


All Replies
Solution
‎02-11-2013 03:45 PM
Respected Advisor
Posts: 3,124

Re: Concatenate strings in alphabetic order

Not sure if you can make use of something like this:

data have;

input (n1 n2) (:$1.);

cards;

a b

b a

;

data want;

  set have;

  array t n:;

call sortc(of t(*));

/*length cat $2.;

do over t;

  cat=cats(cat,t);

end;*/

run;

proc print;run;

Haikuo

Respected Advisor
Posts: 3,777

Re: Concatenate strings in alphabetic order

how about CATS(of t

  • )
  • Respected Advisor
    Posts: 3,124

    Re: Concatenate strings in alphabetic order

    Yes, DN. No loop is needed here. Thanks for sharing, Haikuo

    Occasional Contributor
    Posts: 6

    Re: Concatenate strings in alphabetic order

    It works! Thank very much for your help

    Karen

    Occasional Contributor
    Posts: 6

    Re: Concatenate strings in alphabetic order

    It works! Thank very much for your help

    Karen

    Super User
    Posts: 5,071

    Re: Concatenate strings in alphabetic order

    Hai.kuo (or others?),

    If you're interested and have the time, here's an approach.  I just don't have the time these days, and you'll definitely be faster at this than I am.

    Create a hash table, but rename the keys as they are loaded (dataset:have(rename=(name_1=name_2 name_2=name_1))

    Does every hash table require data, or is it just sufficient to create the keys?  You might have to manufacture a dummy data element.

    Then use the check() method (based on the original values without renaming) to see if a record should be deleted.  This will require checking a second condition, such as name_1 > name_2 so that you don't delete both records of a pair.

    If nobody tries this, I will eventually.  But it might take a month or so.

    Respected Advisor
    Posts: 3,124

    Re: Concatenate strings in alphabetic order

    Astounding, I am overwhelmingly honored, and flattered. Not sure if the following attempt can keep up with your expectation or reflect your thoughts, and it has not been rigorously tested:

    data have;

    input (n1 n2) (:$1.);

    cards;

    a b

    b a

    a b

    b c

    ;

    data _null_;

      if _n_=1 then do;

      if 0 then set have (rename=(n1=n_1 n2=n_2));

    declare hash h(dataset:'have (rename=(n1=n_1 n2=n_2))', multidata:'y');

    h.definekey('n_1');

    h.definedata(all:'y');

    h.definedone();

      end;

    set have end=last;

      ct=0;

      do rc=h.find(key:n1) by 0 while (rc=0 and n2=n_2);

      ct=ct+1;

    rc=h.find_next(key:n1);

      if ct>1 then h.remove(key:n1);

      end;

      if ct>1 then ct=1;

      do rc=h.find(key:n2) by 0 while (rc=0 and n1=n_1);

      ct=ct+1;

    rc=h.find_next(key:n2);

    if ct>1 then h.remove(key:n2);

    end;

    if last then h.output(dataset:'want');

    run;

      

      

    Haikuo

    Super User
    Posts: 5,071

    Re: Concatenate strings in alphabetic order

    Hmmm... I was anticipating handling simpler cases where the same combination was not allowed to repeat.  So the idea is that there couldn't be multiple incoming observations with 'a b', but there could both 'a b' and 'b a'.  I can't fix the syntax, but here's what I was thinking:

    data want;

       if _n_=1 then do;

          if 0 then set have (keep=n1 n2);

          declare hash h(dataset:'have (rename=(n2=n1 n1=n2))'); 

          h.definekey('n1', 'n2');

          h.definedone();

       end;

       set have;

       if h.check=0 and n1 > n2 then delete;

    run;

    It's 100% untested at this point, and probably contains a few syntax errors.  Can you define a hash table with keys only and no data?  I haven't tested.

    It requires that n1 and n2 have the same definition (either both numeric, or both character with the same length).  But after the bugs are worked out, I think it gives you a simple way to remove the "reverse" combinations, all in one step.  (As usual, "simple" is in the eye of the beholder.)

    Super User
    Posts: 5,071

    Re: Concatenate strings in alphabetic order

    Just for the record, I finally had time to test this approach.  It works fine, except for one major drawback.  SAS objects to:

    declare hash h(dataset: 'have (rename=(n2=n1 n1=n2))');

    It will perform the same loading just fine if I go through an extra step just to copy over the data set.  Here is what worked:

    data have;
    input n1 $ n2 $ amount;
    cards;
    a b 1
    a b 2
    b a 3
    a c 4
    c d 5
    ;

    data stupid;
       set have;
       drop amount;
       rename n1=n2 n2=n1;
    run;

    data want;
       if _n_=1 then do;
          if 0 then set have (keep=n1 n2);
         * declare hash h(dataset:'have (rename=(n1=n2 n2=n1))');
          declare hash h (dataset: 'stupid');
          h.definekey('n1', 'n2');
          h.definedone();
       end;
       set have;
       if h.check()=0 and n1 > n2 then delete;
    run;

    A few notes:

    1. Duplicates in the data set used to create the hash table are permitted ... and by default are ignored.  That's not an issue here ... we only need to identify all the existing combinations of n1 and n2.

    2. There was no problem creating a hash table with keys only and no data elements.

    3. Reversing the names before loading the keys seems like an interesting way to approach the problem.  The CHECK method is then able to find out whether the reverse order ever appeared in the original data.  However, it is still necessary to look for a second condition such as n1 > n2.  If you don't do that, then the program would remove both sides of the matched pairs (in this case, would remove all the 'a b' records as well as the 'b a' records).


    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 9 replies
    • 1612 views
    • 6 likes
    • 4 in conversation