BookmarkSubscribeRSS Feed
mbaugh
Calcite | Level 5

Hi,

I have a data set something to the effect of:

ID     Year     Industry

1     2005     1    

1     2006     1

2     2005     1

2     2006     1

3     2005     2

3     2006     2

4     2005     2

4     2006     2

5     2005     2    

5     2006     2

I need to create a data set with all possible firm pairs within industries and years, something to the effect of:

ID1     ID2     Year     Industry

1          2     2005     1

3          4     2005     2

3          5     2005     2

4          5     2005     2   

I've tried copying the original table and merging the copy back to the original using a Cross Join (Cartesian Product) in Proc SQL. The problem with that is it effectively double counts pairs such that it creates data lines for both:

ID1     ID2     Year     Industry

1          2     2005     1

2          1     2005     1

These are duplicate entries for what I need.

Does anyone know how I can pair within a group without using a full Cross Join / Cartesian Product?

Thanks!

6 REPLIES 6
Ksharp
Super User

If I understood what you mean.

data have;
input ID     Year     Industry ;
cards;
1     2005     1    
1     2006     1
2     2005     1
2     2006     1
3     2005     2
3     2006     2
4     2005     2
4     2006     2
5     2005     2    
5     2006     2
;
run;
proc sort data=have;by Year     Industry ;run;
data want;
 set have;
 by Year Industry ; 
 array x{99999} _temporary_;
 if first.Industry then n=0;
 n+1;x{n}=id;
 if last.Industry then do;
   do i=1 to n-1;
    do j=i+1 to n;
      id1=x{i};id2=x{j};output;
     end;
   end;
 end;
drop n i j id;
run;

Xia Keshan

Reeza
Super User

I think your request is the definition of what a cross join is, but you can add a condition to limit the results:

WHERE ID1 LT ID2

OR

WHERE ID1 LE ID2

Haikuo
Onyx | Level 15

Following 's suggestion:

data have;

     input ID     Year Industry;

     cards;

1     2005 1   

1     2006 1

2     2005 1

2     2006 1

3     2005 2

3     2006 2

4     2005 2

4     2006 2

5     2005 2   

5     2006 2

;

run;

proc sql;

     select distinct a.id as id1, b.id as id2, a.year, a.industry

           from have a, have b

                where a.year=b.year and a.industry=b.industry and a.id < b.id

                     order by a.year;

quit;

mbaugh
Calcite | Level 5

Thank you!

Why is the last where statement a.id < b.id and not a.id<>b.id?

ballardw
Super User

Run it and see. Hint: 1 <> 2 and 2<>1.

JMCass
Calcite | Level 5
So simple. Thanks!

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1717 views
  • 7 likes
  • 6 in conversation