Help using Base SAS procedures

Matching Observations / Cartesian Product w/o Duplicates

Reply
Occasional Contributor
Posts: 5

Matching Observations / Cartesian Product w/o Duplicates

Hi,

I have a data set something to the effect of:

ID     Year     Industry

1     2005     1    

1     2006     1

2     2005     1

2     2006     1

3     2005     2

3     2006     2

4     2005     2

4     2006     2

5     2005     2    

5     2006     2

I need to create a data set with all possible firm pairs within industries and years, something to the effect of:

ID1     ID2     Year     Industry

1          2     2005     1

3          4     2005     2

3          5     2005     2

4          5     2005     2   

I've tried copying the original table and merging the copy back to the original using a Cross Join (Cartesian Product) in Proc SQL. The problem with that is it effectively double counts pairs such that it creates data lines for both:

ID1     ID2     Year     Industry

1          2     2005     1

2          1     2005     1

These are duplicate entries for what I need.

Does anyone know how I can pair within a group without using a full Cross Join / Cartesian Product?

Thanks!

Super User
Posts: 9,662

Re: Matching Observations / Cartesian Product w/o Duplicates

If I understood what you mean.

data have;
input ID     Year     Industry ;
cards;
1     2005     1    
1     2006     1
2     2005     1
2     2006     1
3     2005     2
3     2006     2
4     2005     2
4     2006     2
5     2005     2    
5     2006     2
;
run;
proc sort data=have;by Year     Industry ;run;
data want;
 set have;
 by Year Industry ; 
 array x{99999} _temporary_;
 if first.Industry then n=0;
 n+1;x{n}=id;
 if last.Industry then do;
   do i=1 to n-1;
    do j=i+1 to n;
      id1=x{i};id2=x{j};output;
     end;
   end;
 end;
drop n i j id;
run;

Xia Keshan

Super User
Posts: 17,745

Re: Matching Observations / Cartesian Product w/o Duplicates

I think your request is the definition of what a cross join is, but you can add a condition to limit the results:

WHERE ID1 LT ID2

OR

WHERE ID1 LE ID2

Respected Advisor
Posts: 3,124

Re: Matching Observations / Cartesian Product w/o Duplicates

Following 's suggestion:

data have;

     input ID     Year Industry;

     cards;

1     2005 1   

1     2006 1

2     2005 1

2     2006 1

3     2005 2

3     2006 2

4     2005 2

4     2006 2

5     2005 2   

5     2006 2

;

run;

proc sql;

     select distinct a.id as id1, b.id as id2, a.year, a.industry

           from have a, have b

                where a.year=b.year and a.industry=b.industry and a.id < b.id

                     order by a.year;

quit;

Occasional Contributor
Posts: 5

Re: Matching Observations / Cartesian Product w/o Duplicates

Thank you!

Why is the last where statement a.id < b.id and not a.id<>b.id?

Super User
Posts: 10,464

Re: Matching Observations / Cartesian Product w/o Duplicates

Run it and see. Hint: 1 <> 2 and 2<>1.

Ask a Question
Discussion stats
  • 5 replies
  • 254 views
  • 6 likes
  • 5 in conversation