Hello dear SAS community.
I have the following question, there are two variables (id1 and id2) in a table (see below), which should be compared with each other.
Is it possible to delete the matching observations from this table (position of the observation does not matter), so that only the unique observations for these two variables remain there?
id1 | id2 |
990000 | 990000 |
990002 | 990000contour |
990003 | 990002 |
990006 | 990003 |
990007 | 990006 |
990008 | 990007 |
990016 | 990008 |
990018 | 990016 |
990021 | 990018 |
990024 | 990021 |
I would be delighted to receive an answer.
Can you show the code you have tried? Are you familiar with the IF statement, and the DELETE statement?
I cannot figure out what you mean. Can you show what you would want to get for the example input you showed?
Yes, of course. Thanks for the feedback.
I would like to use this data set:
id1 | id2 |
990000 | 990000 |
990002 | 990000contour |
990003 | 990002 |
990006 | 990003 |
990007 | 990006 |
990008 | 990007 |
990016 | 990008 |
990018 | 990016 |
990021 | 990018 |
990024 | 990021 |
to create the following data set:
id1 | id2 |
990024 | 990000contour |
Ultimately, it's about removing the repetitions on both sides of the table (id1 and id2), even if the rows are "offset".
Ok. So know we know what you want as the output, but WHY.
Why would you pair 990024 with 990000contour?
They appear to have nothing to do with each other in the original dataset.
The only way I can see to get those two values out is to treat the values as ONE variable, not TWO.
data list;
input id :$20. @@;
cards;
990000 990000
990002 990000contour
990003 990002
990006 990003
990007 990006
990008 990007
990016 990008
990018 990016
990021 990018
990024 990021
;
proc sql;
create table want as
select id
from list
group by id
having count(*)=1
;
quit;
proc print;
run;
Obs id 1 990000contour 2 990024
Could be solved by a data step:
data want;
merge
have(keep= id1 rename=(id1=id) in=inOne)
have(keep= id2 rename=(id2=id) in=inTwo)
;
by id;
if inOne + inTwo = 1;
run;
EDIT: Both id-variables must be sorted for this step to work, so using one of the other solution is recommended.
As a minor variation, if you want to keep both variables ID1 and ID2, but with a unique row for each unique value, you could:
data work.have;
input id1 :$20. id2 :$20.;
cards;
990000 990000
990002 990000contour
990003 990002
990006 990003
990007 990006
990008 990007
990016 990008
990018 990016
990021 990018
990024 990021
;
data want (label="unmatched ID1's and unmatched ID2's");
set have (keep=id1 rename=id1=id in=in1)
have (keep=id2 rename=id2=id in=in2);
by id;
if first.id=1 and last.id=1;
if in1=1 then id1=id;
else id2=id;
drop id;
run;
proc print;
run;
which generates:
The SAS System |
Obs | id1 | id2 |
---|---|---|
1 | 990000contour | |
2 | 990024 |
Here one option.
data work.have;
input id1 :$20. id2 :$20.;
cards;
990000 990000
990002 990000contour
990003 990002
990006 990003
990007 990006
990008 990007
990016 990008
990018 990016
990021 990018
990024 990021
;
proc sql;
title 'A EXCLUSIVE UNION B';
(
select id1 as id from work.have
except
select id2 as id from work.have
)
union
(
select id2 as id from work.have
except
select id1 as id from work.have
)
;
quit;
Above code based on SAS Docu Producing Rows from the First Query or the Second Query
@Zakharkou wrote:
It looks almost as desired. Would it be possible not to merge the 2 columns, but to leave them as they were with id1 and id2?
Here's a lengthy solution:
data work.have;
input id1 :$20. id2 :$20.;
cards;
990000 990000
990002 990000contour
990003 990002
990006 990003
990007 990006
990008 990007
990016 990008
990018 990016
990021 990018
990024 990021
;
data work.stacked;
set work.have;
array ids[2] id1 id2;
do source = 1 to dim(ids);
id = ids[source];
output;
end;
drop id1 id2;
run;
proc sort data=work.stacked;
by id;
run;
data work.unique;
set work.stacked;
by id;
if first.id and last.id;
run;
proc sort data=work.unique;
by source;
run;
proc transpose data=work.unique out=want(drop=_name_) prefix=id;
id source;
var id;
run;
@Zakharkou wrote:
It looks almost as desired. Would it be possible not to merge the 2 columns, but to leave them as they were with id1 and id2?
Below should work
proc sql;
title 'A EXCLUSIVE UNION B';
(
select id1, ' ' as id2 from work.have
except
select id2 as id1, ' ' as id2 from work.have
)
union all corr
(
select id2, ' ' as id1 from work.have
except
select id1 as id2, ' ' as id1 from work.have
)
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.