BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KarenGeurts
Calcite | Level 5

My first column is name_1 and second column is name_2. All other columns contain variables relating to the combination of name_1 and name_2.

The order of name_1 and name_2 is arbitrary.

My dataset contains identical rows where only name_1 and name_2 are in reverse order. I need to delete these rows.

Is it possible to concatenate name_1 and name_2 in alphabetic order? In that case I could delete duplicated rows where only name_1 and name_2 are in reverse order.

Thanks for your advise,

Karen

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Not sure if you can make use of something like this:

data have;

input (n1 n2) (:$1.);

cards;

a b

b a

;

data want;

  set have;

  array t n:;

call sortc(of t(*));

/*length cat $2.;

do over t;

  cat=cats(cat,t);

end;*/

run;

proc print;run;

Haikuo

View solution in original post

9 REPLIES 9
Haikuo
Onyx | Level 15

Not sure if you can make use of something like this:

data have;

input (n1 n2) (:$1.);

cards;

a b

b a

;

data want;

  set have;

  array t n:;

call sortc(of t(*));

/*length cat $2.;

do over t;

  cat=cats(cat,t);

end;*/

run;

proc print;run;

Haikuo

data_null__
Jade | Level 19

how about CATS(of t

  • )
  • Haikuo
    Onyx | Level 15

    Yes, DN. No loop is needed here. Thanks for sharing, Haikuo

    KarenGeurts
    Calcite | Level 5

    It works! Thank very much for your help

    Karen

    KarenGeurts
    Calcite | Level 5

    It works! Thank very much for your help

    Karen

    Astounding
    PROC Star

    Hai.kuo (or others?),

    If you're interested and have the time, here's an approach.  I just don't have the time these days, and you'll definitely be faster at this than I am.

    Create a hash table, but rename the keys as they are loaded (dataset:have(rename=(name_1=name_2 name_2=name_1))

    Does every hash table require data, or is it just sufficient to create the keys?  You might have to manufacture a dummy data element.

    Then use the check() method (based on the original values without renaming) to see if a record should be deleted.  This will require checking a second condition, such as name_1 > name_2 so that you don't delete both records of a pair.

    If nobody tries this, I will eventually.  But it might take a month or so.

    Haikuo
    Onyx | Level 15

    Astounding, I am overwhelmingly honored, and flattered. Not sure if the following attempt can keep up with your expectation or reflect your thoughts, and it has not been rigorously tested:

    data have;

    input (n1 n2) (:$1.);

    cards;

    a b

    b a

    a b

    b c

    ;

    data _null_;

      if _n_=1 then do;

      if 0 then set have (rename=(n1=n_1 n2=n_2));

    declare hash h(dataset:'have (rename=(n1=n_1 n2=n_2))', multidata:'y');

    h.definekey('n_1');

    h.definedata(all:'y');

    h.definedone();

      end;

    set have end=last;

      ct=0;

      do rc=h.find(key:n1) by 0 while (rc=0 and n2=n_2);

      ct=ct+1;

    rc=h.find_next(key:n1);

      if ct>1 then h.remove(key:n1);

      end;

      if ct>1 then ct=1;

      do rc=h.find(key:n2) by 0 while (rc=0 and n1=n_1);

      ct=ct+1;

    rc=h.find_next(key:n2);

    if ct>1 then h.remove(key:n2);

    end;

    if last then h.output(dataset:'want');

    run;

      

      

    Haikuo

    Astounding
    PROC Star

    Hmmm... I was anticipating handling simpler cases where the same combination was not allowed to repeat.  So the idea is that there couldn't be multiple incoming observations with 'a b', but there could both 'a b' and 'b a'.  I can't fix the syntax, but here's what I was thinking:

    data want;

       if _n_=1 then do;

          if 0 then set have (keep=n1 n2);

          declare hash h(dataset:'have (rename=(n2=n1 n1=n2))'); 

          h.definekey('n1', 'n2');

          h.definedone();

       end;

       set have;

       if h.check=0 and n1 > n2 then delete;

    run;

    It's 100% untested at this point, and probably contains a few syntax errors.  Can you define a hash table with keys only and no data?  I haven't tested.

    It requires that n1 and n2 have the same definition (either both numeric, or both character with the same length).  But after the bugs are worked out, I think it gives you a simple way to remove the "reverse" combinations, all in one step.  (As usual, "simple" is in the eye of the beholder.)

    Astounding
    PROC Star

    Just for the record, I finally had time to test this approach.  It works fine, except for one major drawback.  SAS objects to:

    declare hash h(dataset: 'have (rename=(n2=n1 n1=n2))');

    It will perform the same loading just fine if I go through an extra step just to copy over the data set.  Here is what worked:

    data have;
    input n1 $ n2 $ amount;
    cards;
    a b 1
    a b 2
    b a 3
    a c 4
    c d 5
    ;

    data stupid;
       set have;
       drop amount;
       rename n1=n2 n2=n1;
    run;

    data want;
       if _n_=1 then do;
          if 0 then set have (keep=n1 n2);
         * declare hash h(dataset:'have (rename=(n1=n2 n2=n1))');
          declare hash h (dataset: 'stupid');
          h.definekey('n1', 'n2');
          h.definedone();
       end;
       set have;
       if h.check()=0 and n1 > n2 then delete;
    run;

    A few notes:

    1. Duplicates in the data set used to create the hash table are permitted ... and by default are ignored.  That's not an issue here ... we only need to identify all the existing combinations of n1 and n2.

    2. There was no problem creating a hash table with keys only and no data elements.

    3. Reversing the names before loading the keys seems like an interesting way to approach the problem.  The CHECK method is then able to find out whether the reverse order ever appeared in the original data.  However, it is still necessary to look for a second condition such as n1 > n2.  If you don't do that, then the program would remove both sides of the matched pairs (in this case, would remove all the 'a b' records as well as the 'b a' records).


    sas-innovate-2024.png

    Don't miss out on SAS Innovate - Register now for the FREE Livestream!

    Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 9 replies
    • 5743 views
    • 6 likes
    • 4 in conversation