BookmarkSubscribeRSS Feed
mbaugh
Calcite | Level 5

Hi,

I have a data set something to the effect of:

ID     Year     Industry

1     2005     1    

1     2006     1

2     2005     1

2     2006     1

3     2005     2

3     2006     2

4     2005     2

4     2006     2

5     2005     2    

5     2006     2

I need to create a data set with all possible firm pairs within industries and years, something to the effect of:

ID1     ID2     Year     Industry

1          2     2005     1

3          4     2005     2

3          5     2005     2

4          5     2005     2   

I've tried copying the original table and merging the copy back to the original using a Cross Join (Cartesian Product) in Proc SQL. The problem with that is it effectively double counts pairs such that it creates data lines for both:

ID1     ID2     Year     Industry

1          2     2005     1

2          1     2005     1

These are duplicate entries for what I need.

Does anyone know how I can pair within a group without using a full Cross Join / Cartesian Product?

Thanks!

6 REPLIES 6
Ksharp
Super User

If I understood what you mean.

data have;
input ID     Year     Industry ;
cards;
1     2005     1    
1     2006     1
2     2005     1
2     2006     1
3     2005     2
3     2006     2
4     2005     2
4     2006     2
5     2005     2    
5     2006     2
;
run;
proc sort data=have;by Year     Industry ;run;
data want;
 set have;
 by Year Industry ; 
 array x{99999} _temporary_;
 if first.Industry then n=0;
 n+1;x{n}=id;
 if last.Industry then do;
   do i=1 to n-1;
    do j=i+1 to n;
      id1=x{i};id2=x{j};output;
     end;
   end;
 end;
drop n i j id;
run;

Xia Keshan

Reeza
Super User

I think your request is the definition of what a cross join is, but you can add a condition to limit the results:

WHERE ID1 LT ID2

OR

WHERE ID1 LE ID2

Haikuo
Onyx | Level 15

Following 's suggestion:

data have;

     input ID     Year Industry;

     cards;

1     2005 1   

1     2006 1

2     2005 1

2     2006 1

3     2005 2

3     2006 2

4     2005 2

4     2006 2

5     2005 2   

5     2006 2

;

run;

proc sql;

     select distinct a.id as id1, b.id as id2, a.year, a.industry

           from have a, have b

                where a.year=b.year and a.industry=b.industry and a.id < b.id

                     order by a.year;

quit;

mbaugh
Calcite | Level 5

Thank you!

Why is the last where statement a.id < b.id and not a.id<>b.id?

ballardw
Super User

Run it and see. Hint: 1 <> 2 and 2<>1.

JMCass
Calcite | Level 5
So simple. Thanks!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1469 views
  • 7 likes
  • 6 in conversation