BookmarkSubscribeRSS Feed
mbaugh
Calcite | Level 5

Hi,

I have a data set something to the effect of:

ID     Year     Industry

1     2005     1    

1     2006     1

2     2005     1

2     2006     1

3     2005     2

3     2006     2

4     2005     2

4     2006     2

5     2005     2    

5     2006     2

I need to create a data set with all possible firm pairs within industries and years, something to the effect of:

ID1     ID2     Year     Industry

1          2     2005     1

3          4     2005     2

3          5     2005     2

4          5     2005     2   

I've tried copying the original table and merging the copy back to the original using a Cross Join (Cartesian Product) in Proc SQL. The problem with that is it effectively double counts pairs such that it creates data lines for both:

ID1     ID2     Year     Industry

1          2     2005     1

2          1     2005     1

These are duplicate entries for what I need.

Does anyone know how I can pair within a group without using a full Cross Join / Cartesian Product?

Thanks!

6 REPLIES 6
Ksharp
Super User

If I understood what you mean.

data have;
input ID     Year     Industry ;
cards;
1     2005     1    
1     2006     1
2     2005     1
2     2006     1
3     2005     2
3     2006     2
4     2005     2
4     2006     2
5     2005     2    
5     2006     2
;
run;
proc sort data=have;by Year     Industry ;run;
data want;
 set have;
 by Year Industry ; 
 array x{99999} _temporary_;
 if first.Industry then n=0;
 n+1;x{n}=id;
 if last.Industry then do;
   do i=1 to n-1;
    do j=i+1 to n;
      id1=x{i};id2=x{j};output;
     end;
   end;
 end;
drop n i j id;
run;

Xia Keshan

Reeza
Super User

I think your request is the definition of what a cross join is, but you can add a condition to limit the results:

WHERE ID1 LT ID2

OR

WHERE ID1 LE ID2

Haikuo
Onyx | Level 15

Following 's suggestion:

data have;

     input ID     Year Industry;

     cards;

1     2005 1   

1     2006 1

2     2005 1

2     2006 1

3     2005 2

3     2006 2

4     2005 2

4     2006 2

5     2005 2   

5     2006 2

;

run;

proc sql;

     select distinct a.id as id1, b.id as id2, a.year, a.industry

           from have a, have b

                where a.year=b.year and a.industry=b.industry and a.id < b.id

                     order by a.year;

quit;

mbaugh
Calcite | Level 5

Thank you!

Why is the last where statement a.id < b.id and not a.id<>b.id?

ballardw
Super User

Run it and see. Hint: 1 <> 2 and 2<>1.

JMCass
Calcite | Level 5
So simple. Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 995 views
  • 7 likes
  • 6 in conversation