Solved: Re: How to compare values in two columns regardless of the row orders?

almmotamedi · Posted 02-22-2016 05:11 PM

Hi SAS users,

I have two columns with the same type character values. I just want to check if there is any duplicate between thease two columns (when I say duplicate, I don't mean duplicate at each row; I mean any repeated value in both columns regardless of the order.) Could you please help with it?

Thank you

FreelanceReinh · Posted 02-23-2016 04:13 AM

Hi @almmotamedi,

Thanks for the clarification. If you want to create a new dataset (call it WANT) with one variable, New_column1, from the existing dataset (call it HAVE) with the two variables, you could use the EXCEPT operator in PROC SQL:

proc sql;
create table want as
select column1 as New_column1 from have
except
select column2 from have;
quit;

This assumes that, say, "B" should not occur in New_column1 even if it occurred twice in column1, but only once in column2. If, in the latter case, you would like to take into account the "multiplicities", i.e. eliminate only one of the two occurrences of "B", just replace except by except all in the code above.

View solution in original post

Reeza · Posted 02-22-2016 05:50 PM

What do you want the output to be?

almmotamedi · Posted 02-22-2016 10:09 PM

Its simply like this:

column1 column2

A B

B C

C

D

I would like to create the New_column1 as below:

New_column1

A

D

(Because, A and D are not repeated/duplicate in the 2nd column.)

FreelanceReinh · Posted 02-23-2016 04:13 AM

Hi @almmotamedi,

Thanks for the clarification. If you want to create a new dataset (call it WANT) with one variable, New_column1, from the existing dataset (call it HAVE) with the two variables, you could use the EXCEPT operator in PROC SQL:

proc sql;
create table want as
select column1 as New_column1 from have
except
select column2 from have;
quit;

This assumes that, say, "B" should not occur in New_column1 even if it occurred twice in column1, but only once in column2. If, in the latter case, you would like to take into account the "multiplicities", i.e. eliminate only one of the two occurrences of "B", just replace except by except all in the code above.

almmotamedi · Posted 02-23-2016 09:07 AM

Thank you so much

FreelanceReinh · Posted 02-22-2016 06:07 PM

Hi @almmotamedi,

Do you mean something like this?

data have;
length var1 var2 $10;
input var1 var2;
cards;
blue   red
green  yellow
red    brown
black  green
;

proc sql;
create table dups as
select a.var1
from have a, have b
where a.var1=b.var2;
quit;

proc print noobs;
run;

Result:

var1

green
red

mohamed_zaki · Posted 02-22-2016 06:35 PM

You may need to

select distinct a.var1

if the value could be repeated in each column more than one time. Or you will end up by long list.

Ksharp · Posted 02-22-2016 08:29 PM

It looks like you need some QUERY tool. MERGE, Hash Table, SQL , IML ,Array ..... all can do that.

Here is the IML code :

data have;
length var1 var2 $10;
input var1 var2;
cards;
blue   red
green  yellow
red    brown
black  green
;
run;
proc iml;
use have;
read all var {var1 var2};
close have;
x=var1[loc(element(var1,var2))];
print x;
quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away