BookmarkSubscribeRSS Feed
mbaugh
Calcite | Level 5

Hi,

I have a data set something to the effect of:

ID     Year     Industry

1     2005     1    

1     2006     1

2     2005     1

2     2006     1

3     2005     2

3     2006     2

4     2005     2

4     2006     2

5     2005     2    

5     2006     2

I need to create a data set with all possible firm pairs within industries and years, something to the effect of:

ID1     ID2     Year     Industry

1          2     2005     1

3          4     2005     2

3          5     2005     2

4          5     2005     2   

I've tried copying the original table and merging the copy back to the original using a Cross Join (Cartesian Product) in Proc SQL. The problem with that is it effectively double counts pairs such that it creates data lines for both:

ID1     ID2     Year     Industry

1          2     2005     1

2          1     2005     1

These are duplicate entries for what I need.

Does anyone know how I can pair within a group without using a full Cross Join / Cartesian Product?

Thanks!

6 REPLIES 6
Ksharp
Super User

If I understood what you mean.

data have;
input ID     Year     Industry ;
cards;
1     2005     1    
1     2006     1
2     2005     1
2     2006     1
3     2005     2
3     2006     2
4     2005     2
4     2006     2
5     2005     2    
5     2006     2
;
run;
proc sort data=have;by Year     Industry ;run;
data want;
 set have;
 by Year Industry ; 
 array x{99999} _temporary_;
 if first.Industry then n=0;
 n+1;x{n}=id;
 if last.Industry then do;
   do i=1 to n-1;
    do j=i+1 to n;
      id1=x{i};id2=x{j};output;
     end;
   end;
 end;
drop n i j id;
run;

Xia Keshan

Reeza
Super User

I think your request is the definition of what a cross join is, but you can add a condition to limit the results:

WHERE ID1 LT ID2

OR

WHERE ID1 LE ID2

Haikuo
Onyx | Level 15

Following 's suggestion:

data have;

     input ID     Year Industry;

     cards;

1     2005 1   

1     2006 1

2     2005 1

2     2006 1

3     2005 2

3     2006 2

4     2005 2

4     2006 2

5     2005 2   

5     2006 2

;

run;

proc sql;

     select distinct a.id as id1, b.id as id2, a.year, a.industry

           from have a, have b

                where a.year=b.year and a.industry=b.industry and a.id < b.id

                     order by a.year;

quit;

mbaugh
Calcite | Level 5

Thank you!

Why is the last where statement a.id < b.id and not a.id<>b.id?

ballardw
Super User

Run it and see. Hint: 1 <> 2 and 2<>1.

JMCass
Calcite | Level 5
So simple. Thanks!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1773 views
  • 7 likes
  • 6 in conversation