Solved: Re: Concatenate strings in alphabetic order

KarenGeurts · Posted 02-11-2013 03:36 PM

My first column is name_1 and second column is name_2. All other columns contain variables relating to the combination of name_1 and name_2.

The order of name_1 and name_2 is arbitrary.

My dataset contains identical rows where only name_1 and name_2 are in reverse order. I need to delete these rows.

Is it possible to concatenate name_1 and name_2 in alphabetic order? In that case I could delete duplicated rows where only name_1 and name_2 are in reverse order.

Thanks for your advise,

Karen

Haikuo · Posted 02-11-2013 03:45 PM

Not sure if you can make use of something like this:

data have;

input (n1 n2) (:$1.);

cards;

a b

b a

;

data want;

set have;

array t n:;

call sortc(of t(*));

/*length cat $2.;

do over t;

cat=cats(cat,t);

end;*/

run;

proc print;run;

Haikuo

View solution in original post

Haikuo · Posted 02-11-2013 03:45 PM

Not sure if you can make use of something like this:

data have;

input (n1 n2) (:$1.);

cards;

a b

b a

;

data want;

set have;

array t n:;

call sortc(of t(*));

/*length cat $2.;

do over t;

cat=cats(cat,t);

end;*/

run;

proc print;run;

Haikuo

data_null__ · Posted 02-11-2013 03:58 PM

how about CATS(of t

)

Haikuo · Posted 02-11-2013 04:06 PM

Yes, DN. No loop is needed here. Thanks for sharing, Haikuo

KarenGeurts · Posted 02-12-2013 10:00 AM

It works! Thank very much for your help

Karen

KarenGeurts · Posted 02-12-2013 10:00 AM

It works! Thank very much for your help

Karen

Astounding · Posted 02-11-2013 04:16 PM

Hai.kuo (or others?),

If you're interested and have the time, here's an approach. I just don't have the time these days, and you'll definitely be faster at this than I am.

Create a hash table, but rename the keys as they are loaded (dataset:have(rename=(name_1=name_2 name_2=name_1))

Does every hash table require data, or is it just sufficient to create the keys? You might have to manufacture a dummy data element.

Then use the check() method (based on the original values without renaming) to see if a record should be deleted. This will require checking a second condition, such as name_1 > name_2 so that you don't delete both records of a pair.

If nobody tries this, I will eventually. But it might take a month or so.

Haikuo · Posted 02-12-2013 12:53 PM

Astounding, I am overwhelmingly honored, and flattered. Not sure if the following attempt can keep up with your expectation or reflect your thoughts, and it has not been rigorously tested:

data have;

input (n1 n2) (:$1.);

cards;

a b

b a

a b

b c

;

data _null_;

if _n_=1 then do;

if 0 then set have (rename=(n1=n_1 n2=n_2));

declare hash h(dataset:'have (rename=(n1=n_1 n2=n_2))', multidata:'y');

h.definekey('n_1');

h.definedata(all:'y');

h.definedone();

end;

set have end=last;

ct=0;

do rc=h.find(key:n1) by 0 while (rc=0 and n2=n_2);

ct=ct+1;

rc=h.find_next(key:n1);

if ct>1 then h.remove(key:n1);

end;

if ct>1 then ct=1;

do rc=h.find(key:n2) by 0 while (rc=0 and n1=n_1);

ct=ct+1;

rc=h.find_next(key:n2);

if ct>1 then h.remove(key:n2);

end;

if last then h.output(dataset:'want');

run;

Haikuo

Astounding · Posted 02-12-2013 01:16 PM

Hmmm... I was anticipating handling simpler cases where the same combination was not allowed to repeat. So the idea is that there couldn't be multiple incoming observations with 'a b', but there could both 'a b' and 'b a'. I can't fix the syntax, but here's what I was thinking:

data want;

if _n_=1 then do;

if 0 then set have (keep=n1 n2);

declare hash h(dataset:'have (rename=(n2=n1 n1=n2))');

h.definekey('n1', 'n2');

h.definedone();

end;

set have;

if h.check=0 and n1 > n2 then delete;

run;

It's 100% untested at this point, and probably contains a few syntax errors. Can you define a hash table with keys only and no data? I haven't tested.

It requires that n1 and n2 have the same definition (either both numeric, or both character with the same length). But after the bugs are worked out, I think it gives you a simple way to remove the "reverse" combinations, all in one step. (As usual, "simple" is in the eye of the beholder.)

Astounding · Posted 02-15-2013 02:35 PM

Just for the record, I finally had time to test this approach. It works fine, except for one major drawback. SAS objects to:

declare hash h(dataset: 'have (rename=(n2=n1 n1=n2))');

It will perform the same loading just fine if I go through an extra step just to copy over the data set. Here is what worked:

data have;
input n1 $ n2 $ amount;
cards;
a b 1
a b 2
b a 3
a c 4
c d 5
;

data stupid;
   set have;
   drop amount;
   rename n1=n2 n2=n1;
run;

data want;
   if _n_=1 then do;
      if 0 then set have (keep=n1 n2);
     * declare hash h(dataset:'have (rename=(n1=n2 n2=n1))');
      declare hash h (dataset: 'stupid');
      h.definekey('n1', 'n2');
      h.definedone();
   end;
   set have;
   if h.check()=0 and n1 > n2 then delete;
run;

A few notes:

1. Duplicates in the data set used to create the hash table are permitted ... and by default are ignored. That's not an issue here ... we only need to identify all the existing combinations of n1 and n2.

2. There was no problem creating a hash table with keys only and no data elements.

3. Reversing the names before loading the keys seems like an interesting way to approach the problem. The CHECK method is then able to find out whether the reverse order ever appeared in the original data. However, it is still necessary to look for a second condition such as n1 > n2. If you don't do that, then the program would remove both sides of the matched pairs (in this case, would remove all the 'a b' records as well as the 'b a' records).

Classroom Training Available!