Calcite | Level 5

## Matching Observations / Cartesian Product w/o Duplicates

Hi,

I have a data set something to the effect of:

ID     Year     Industry

1     2005     1

1     2006     1

2     2005     1

2     2006     1

3     2005     2

3     2006     2

4     2005     2

4     2006     2

5     2005     2

5     2006     2

I need to create a data set with all possible firm pairs within industries and years, something to the effect of:

ID1     ID2     Year     Industry

1          2     2005     1

3          4     2005     2

3          5     2005     2

4          5     2005     2

I've tried copying the original table and merging the copy back to the original using a Cross Join (Cartesian Product) in Proc SQL. The problem with that is it effectively double counts pairs such that it creates data lines for both:

ID1     ID2     Year     Industry

1          2     2005     1

2          1     2005     1

These are duplicate entries for what I need.

Does anyone know how I can pair within a group without using a full Cross Join / Cartesian Product?

Thanks!

6 REPLIES 6
Super User

## Re: Matching Observations / Cartesian Product w/o Duplicates

If I understood what you mean.

```data have;
input ID     Year     Industry ;
cards;
1     2005     1
1     2006     1
2     2005     1
2     2006     1
3     2005     2
3     2006     2
4     2005     2
4     2006     2
5     2005     2
5     2006     2
;
run;
proc sort data=have;by Year     Industry ;run;
data want;
set have;
by Year Industry ;
array x{99999} _temporary_;
if first.Industry then n=0;
n+1;x{n}=id;
if last.Industry then do;
do i=1 to n-1;
do j=i+1 to n;
id1=x{i};id2=x{j};output;
end;
end;
end;
drop n i j id;
run;

```

Xia Keshan

Super User

## Re: Matching Observations / Cartesian Product w/o Duplicates

I think your request is the definition of what a cross join is, but you can add a condition to limit the results:

WHERE ID1 LT ID2

OR

WHERE ID1 LE ID2

Onyx | Level 15

## Re: Matching Observations / Cartesian Product w/o Duplicates

Following 's suggestion:

data have;

input ID     Year Industry;

cards;

1     2005 1

1     2006 1

2     2005 1

2     2006 1

3     2005 2

3     2006 2

4     2005 2

4     2006 2

5     2005 2

5     2006 2

;

run;

proc sql;

select distinct a.id as id1, b.id as id2, a.year, a.industry

from have a, have b

where a.year=b.year and a.industry=b.industry and a.id < b.id

order by a.year;

quit;

Calcite | Level 5

## Re: Matching Observations / Cartesian Product w/o Duplicates

Thank you!

Why is the last where statement a.id < b.id and not a.id<>b.id?

Super User

## Re: Matching Observations / Cartesian Product w/o Duplicates

Run it and see. Hint: 1 <> 2 and 2<>1.

Calcite | Level 5

## Re: Matching Observations / Cartesian Product w/o Duplicates

So simple. Thanks!
Discussion stats
• 6 replies
• 859 views
• 7 likes
• 6 in conversation