My first column is name_1 and second column is name_2. All other columns contain variables relating to the combination of name_1 and name_2.
The order of name_1 and name_2 is arbitrary.
My dataset contains identical rows where only name_1 and name_2 are in reverse order. I need to delete these rows.
Is it possible to concatenate name_1 and name_2 in alphabetic order? In that case I could delete duplicated rows where only name_1 and name_2 are in reverse order.
Thanks for your advise,
Karen
Not sure if you can make use of something like this:
data have;
input (n1 n2) (:$1.);
cards;
a b
b a
;
data want;
set have;
array t n:;
call sortc(of t(*));
/*length cat $2.;
do over t;
cat=cats(cat,t);
end;*/
run;
proc print;run;
Haikuo
Not sure if you can make use of something like this:
data have;
input (n1 n2) (:$1.);
cards;
a b
b a
;
data want;
set have;
array t n:;
call sortc(of t(*));
/*length cat $2.;
do over t;
cat=cats(cat,t);
end;*/
run;
proc print;run;
Haikuo
how about CATS(of t
Yes, DN. No loop is needed here. Thanks for sharing, Haikuo
It works! Thank very much for your help
Karen
It works! Thank very much for your help
Karen
Hai.kuo (or others?),
If you're interested and have the time, here's an approach. I just don't have the time these days, and you'll definitely be faster at this than I am.
Create a hash table, but rename the keys as they are loaded (dataset:have(rename=(name_1=name_2 name_2=name_1))
Does every hash table require data, or is it just sufficient to create the keys? You might have to manufacture a dummy data element.
Then use the check() method (based on the original values without renaming) to see if a record should be deleted. This will require checking a second condition, such as name_1 > name_2 so that you don't delete both records of a pair.
If nobody tries this, I will eventually. But it might take a month or so.
Astounding, I am overwhelmingly honored, and flattered. Not sure if the following attempt can keep up with your expectation or reflect your thoughts, and it has not been rigorously tested:
data have;
input (n1 n2) (:$1.);
cards;
a b
b a
a b
b c
;
data _null_;
if _n_=1 then do;
if 0 then set have (rename=(n1=n_1 n2=n_2));
declare hash h(dataset:'have (rename=(n1=n_1 n2=n_2))', multidata:'y');
h.definekey('n_1');
h.definedata(all:'y');
h.definedone();
end;
set have end=last;
ct=0;
do rc=h.find(key:n1) by 0 while (rc=0 and n2=n_2);
ct=ct+1;
rc=h.find_next(key:n1);
if ct>1 then h.remove(key:n1);
end;
if ct>1 then ct=1;
do rc=h.find(key:n2) by 0 while (rc=0 and n1=n_1);
ct=ct+1;
rc=h.find_next(key:n2);
if ct>1 then h.remove(key:n2);
end;
if last then h.output(dataset:'want');
run;
Haikuo
Hmmm... I was anticipating handling simpler cases where the same combination was not allowed to repeat. So the idea is that there couldn't be multiple incoming observations with 'a b', but there could both 'a b' and 'b a'. I can't fix the syntax, but here's what I was thinking:
data want;
if _n_=1 then do;
if 0 then set have (keep=n1 n2);
declare hash h(dataset:'have (rename=(n2=n1 n1=n2))');
h.definekey('n1', 'n2');
h.definedone();
end;
set have;
if h.check=0 and n1 > n2 then delete;
run;
It's 100% untested at this point, and probably contains a few syntax errors. Can you define a hash table with keys only and no data? I haven't tested.
It requires that n1 and n2 have the same definition (either both numeric, or both character with the same length). But after the bugs are worked out, I think it gives you a simple way to remove the "reverse" combinations, all in one step. (As usual, "simple" is in the eye of the beholder.)
Just for the record, I finally had time to test this approach. It works fine, except for one major drawback. SAS objects to:
declare hash h(dataset: 'have (rename=(n2=n1 n1=n2))');
It will perform the same loading just fine if I go through an extra step just to copy over the data set. Here is what worked:
data have;
input n1 $ n2 $ amount;
cards;
a b 1
a b 2
b a 3
a c 4
c d 5
;
data stupid;
set have;
drop amount;
rename n1=n2 n2=n1;
run;
data want;
if _n_=1 then do;
if 0 then set have (keep=n1 n2);
* declare hash h(dataset:'have (rename=(n1=n2 n2=n1))');
declare hash h (dataset: 'stupid');
h.definekey('n1', 'n2');
h.definedone();
end;
set have;
if h.check()=0 and n1 > n2 then delete;
run;
A few notes:
1. Duplicates in the data set used to create the hash table are permitted ... and by default are ignored. That's not an issue here ... we only need to identify all the existing combinations of n1 and n2.
2. There was no problem creating a hash table with keys only and no data elements.
3. Reversing the names before loading the keys seems like an interesting way to approach the problem. The CHECK method is then able to find out whether the reverse order ever appeared in the original data. However, it is still necessary to look for a second condition such as n1 > n2. If you don't do that, then the program would remove both sides of the matched pairs (in this case, would remove all the 'a b' records as well as the 'b a' records).
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.