Merging two tables by choosing the CLOSEST dates

Accepted Solution Solved
Reply
Contributor
Posts: 66
Accepted Solution

Merging two tables by choosing the CLOSEST dates

[ Edited ]
 

Dear Everyone;

 

Actually, I am trying to obtain 10 control firms for each of my sample firms with year and industry fixed effect. Here. fyear indicates financial year, cusip indicates company id., sic2 indicates industry id, wBAHR1hat_rank indicates ranking of each company in the corresponding financial year based on some measure. Now, my matching procedure pairs the sample firms with adjacent control firms from control firm file in terms of the wBAHR1hat_rank. It means that company cusip 123 will mached to the nearest ranked control firms in the same year and the same industry. The rank of the control firms should be less than the rank of the sample firm.

 

I may require 2 control firms in some cases and also require 10 control firms in some other cases. If i need two control firms, then cusip=123 in year 1990 and industry 12 with its ranking=12 should be matched with cusip 124 and 125 in year 1990 and industry 12. cusip=859 in year 2000 and industry 11 with its ranking=15 should be matched with cusip 868 and 869 in year 1990 and industry 12. I included my desired sample output in case of two control firms here.

 

 

my sample firm file looks like the following:

 

fyear cusip SIC2 wBAHR1hat_rank
1990 123 12 12
2000 859 11 15

 

my control firm file:

 

fyear cusip SIC2 wBAHR1hat_rank
1990 124 12 13
1990 125 12 14
1990 126 12 15
1990 127 12 16
1990 128 12 17
1990 129 12 18
1990 759 13 19
1990 760 15 20
1990 761 12 21
1990 762 12 22
1990 763 12 23
1990 764 12 24
2000 860 11 21
2000 861 11 12
2000 863 11 23
2000 864 11 13
2000 867 11 14
2000 868 11 16
2000 869 11 17
2000 870 12 18
2000 872 14 19
2000 873 11 20
2000 874 11 21
2000 876 11 22
2000 877 11 23

 

my expected output in case of two control firms:

 

fyear cusip SIC2 wBAHR1hat_rank CUSIP_Matching1 CUSIP_Matching2
1990 123 12 12 124 125
2000 859 11 15 868 869

 

 

I am using the following code. But with this code I am getting only one control firm.

 

proc sql;
create table match1 as
select a.*, b.CUSIP as CUSIP_Matching
from joint8 a
left join
joint9 b
on a.fyear=b.fyear and a.sic2=b.sic2
where b.wBAHR1hat_Rank >= a.wBAHR1hat_Rank
group by a.wBAHR1hat_Rank,a.fyear, a.sic2
having abs(a.wBAHR1hat_Rank-b.wBAHR1hat_Rank)=min(abs(a.wBAHR1hat_Rank-b.wBAHR1hat_Rank))
order by a.fyear, b.wBAHR1hat_Rank descending
;
quit;


Accepted Solutions
Solution
‎06-30-2017 11:28 PM
PROC Star
Posts: 1,561

Re: Merging two tables by choosing the CLOSEST dates

SQL is not very good at this.

I tried the monotonic function in vain, and SAS does not have a rank over() function.

So like this?


proc sql;
  create view _V as 
  select a.*
  ,abs(a.WBAHR1HAT_RANK-b.WBAHR1HAT_RANK) as DIF 
  ,b.CUSIP as CUSIPB 
  from SAMPLE   a
  left join
  CONTROL       b
  on  a.FYEAR           = b.FYEAR 
  and a.SIC2            = b.SIC2
  and b.WBAHR1HAT_RANK >= a.WBAHR1HAT_RANK
  order by 1,2,3,4,5  ;
quit;
data WANT;
  set _V;
  retain N 0 CUSIP_MATCHING1;
  by FYEAR CUSIP SIC2 WBAHR1HAT_RANK;
  N=N*^first.WBAHR1HAT_RANK+1;
  if N=1 then CUSIP_MATCHING1 = CUSIPB;
  if N=2 then do;
    CUSIP_MATCHING2 = CUSIPB;
    output;
  end;
run;

 

FYEAR CUSIP SIC2 WBAHR1HAT_RANK CUSIP_MATCHING1 CUSIP_MATCHING2
1990 123 12 12 124 125
2000 859 11 15 868 869

 

 

View solution in original post


All Replies
PROC Star
Posts: 1,561

Re: Merging two tables by choosing the CLOSEST dates

Your question is unclear and the desired output is missing.

Which rows out of the possible crossings below do you keep and why?

i don't see how 1/12/2006 makes the cut.

 


proc sql;
  select a.* , b.DATE as DATEB, abs(a.DATE- b.DATE) as DAYS 
  from A
        left join 
       B 
        on a.TICKER=b.TICKER 
  order by a.TICKER, a.DATE, DAYS ; 
quit;

 

DATE TICKER DATEB DAYS
04/06/2008 ABC 01/12/2008 180
04/06/2008 ABC 01/12/2007 186
04/06/2008 ABC 01/12/2009 545
04/06/2008 ABC 01/12/2006 551
04/06/2008 ABC 01/12/2010 910
12/12/2008 ABC 01/12/2008 11
12/12/2008 ABC 01/12/2009 354
12/12/2008 ABC 01/12/2007 377
12/12/2008 ABC 01/12/2010 719
12/12/2008 ABC 01/12/2006 742
06/02/2002 DEF 01/12/2001 67
06/02/2002 DEF 01/12/2007 2124
06/02/2002 DEF 01/12/2008 2490
06/02/2002 DEF 01/12/2009 2855
06/02/2002 DEF 01/12/2010 3220
25/03/2010 DEF 01/12/2009 114
25/03/2010 DEF 01/12/2010 251
25/03/2010 DEF 01/12/2008 479
25/03/2010 DEF 01/12/2007 845
25/03/2010 DEF 01/12/2001 3036

 

Contributor
Posts: 66

Re: Merging two tables by choosing the CLOSEST dates

Dear ChrisNZ,

 

I think I messed up in my question. Actually, I am trying to obtain 10 control firms for each of my sample firms with year and industry fixed effect. Here. fyear indicates financial year, cusip indicates company id., sic2 indicates industry id, wBAHR1hat_rank indicates ranking of each company in the corresponding financial year based on some measure. Now, my matching procedure pairs the sample firms with adjacent control firms from control firm file in terms of the wBAHR1hat_rank. It means that company cusip 123 will mached to the nearest ranked control firms in the same year and the same industry. The rank of the control firms should be less than the rank of the sample firm.

 

I may require 2 control firms in some cases and also require 10 control firms in some other cases. If i need two control firms, then cusip=123 in year 1990 and industry 12 with its ranking=12 should be matched with cusip 124 and 125 in year 1990 and industry 12. cusip=859 in year 2000 and industry 11 with its ranking=15 should be matched with cusip 868 and 869 in year 1990 and industry 12. I included my desired sample output in case of two control firms here.

 

 

my sample firm file looks like the following:

 

fyear cusip SIC2 wBAHR1hat_rank
1990 123 12 12
2000 859 11 15

 

my control firm file:

 

fyear cusip SIC2 wBAHR1hat_rank
1990 124 12 13
1990 125 12 14
1990 126 12 15
1990 127 12 16
1990 128 12 17
1990 129 12 18
1990 759 13 19
1990 760 15 20
1990 761 12 21
1990 762 12 22
1990 763 12 23
1990 764 12 24
2000 860 11 21
2000 861 11 12
2000 863 11 23
2000 864 11 13
2000 867 11 14
2000 868 11 16
2000 869 11 17
2000 870 12 18
2000 872 14 19
2000 873 11 20
2000 874 11 21
2000 876 11 22
2000 877 11 23

 

my expected output in case of two control firms:

 

fyear cusip SIC2 wBAHR1hat_rank CUSIP_Matching1 CUSIP_Matching2
1990 123 12 12 124 125
2000 859 11 15 868 869

 

 

I am using the following code. But with this code I am getting only one control firm.

 

proc sql;
create table match1 as
select a.*, b.CUSIP as CUSIP_Matching
from joint8 a
left join
joint9 b
on a.fyear=b.fyear and a.sic2=b.sic2
where b.wBAHR1hat_Rank >= a.wBAHR1hat_Rank
group by a.wBAHR1hat_Rank,a.fyear, a.sic2
having abs(a.wBAHR1hat_Rank-b.wBAHR1hat_Rank)=min(abs(a.wBAHR1hat_Rank-b.wBAHR1hat_Rank))
order by a.fyear, b.wBAHR1hat_Rank descending
;
quit;

Solution
‎06-30-2017 11:28 PM
PROC Star
Posts: 1,561

Re: Merging two tables by choosing the CLOSEST dates

SQL is not very good at this.

I tried the monotonic function in vain, and SAS does not have a rank over() function.

So like this?


proc sql;
  create view _V as 
  select a.*
  ,abs(a.WBAHR1HAT_RANK-b.WBAHR1HAT_RANK) as DIF 
  ,b.CUSIP as CUSIPB 
  from SAMPLE   a
  left join
  CONTROL       b
  on  a.FYEAR           = b.FYEAR 
  and a.SIC2            = b.SIC2
  and b.WBAHR1HAT_RANK >= a.WBAHR1HAT_RANK
  order by 1,2,3,4,5  ;
quit;
data WANT;
  set _V;
  retain N 0 CUSIP_MATCHING1;
  by FYEAR CUSIP SIC2 WBAHR1HAT_RANK;
  N=N*^first.WBAHR1HAT_RANK+1;
  if N=1 then CUSIP_MATCHING1 = CUSIPB;
  if N=2 then do;
    CUSIP_MATCHING2 = CUSIPB;
    output;
  end;
run;

 

FYEAR CUSIP SIC2 WBAHR1HAT_RANK CUSIP_MATCHING1 CUSIP_MATCHING2
1990 123 12 12 124 125
2000 859 11 15 868 869

 

 

Contributor
Posts: 66

Re: Merging two tables by choosing the CLOSEST dates

Thank you 

 

PROC Star
Posts: 1,561

Re: Merging two tables by choosing the CLOSEST dates

Here it is for 4:

data WANT;
  set _V;
  retain N 0 CUSIP_MATCHING1-CUSIP_MATCHING3;
  by FYEAR CUSIP SIC2 WBAHR1HAT_RANK;
  N=N*^first.WBAHR1HAT_RANK+1;
  if N=1 then CUSIP_MATCHING1 = CUSIPB;
  if N=2 then CUSIP_MATCHING2 = CUSIPB;
  if N=3 then CUSIP_MATCHING3 = CUSIPB;
  if N=4 then do;
    CUSIP_MATCHING4 = CUSIPB;
    output;
  end;
run;

You can of course an an array to remove the repetitive bits.

 

Contributor
Posts: 66

Re: Merging two tables by choosing the CLOSEST dates

Thank you so much ChrisNZ. It was a great help. I am grateful to you.

PROC Star
Posts: 1,561

Re: Merging two tables by choosing the CLOSEST dates

Contributor
Posts: 66

Re: Merging two tables by choosing the CLOSEST dates

Dear 

 

 

Contributor
Posts: 66

Re: Merging two tables by choosing the CLOSEST dates

I got it. Thank you. I appreciate your help Smiley Happy

Contributor
Posts: 66

Re: Merging two tables by choosing the CLOSEST dates

I am sorry. I have one more question. The code you provided only keeps those sample firms which have matching firms from control firm file. But It removes those sample firms that have no matching firms. But I want to keep all the sample firms in my final output file. How can I do it?

 

My sample file has 5100 observations. But after I apply your code, the output file has 4600 observations because the code is removing those sample firms which have no matching firms from the control firm file.

PROC Star
Posts: 1,561

Re: Merging two tables by choosing the CLOSEST dates

[ Edited ]

The code only keeps those sample firms which have 12 matching firms from control firm file.

 

Maybe this does what you want?

 

data test;
set _V;
retain N 0 CUSIP_MATCHING1-CUSIP_MATCHING12;
by FYEAR CUSIP SIC2 WBAHR1HAT_RANK;
N=N*^first.WBAHR1HAT_RANK+1;
if N= 1 then CUSIP_MATCHING1  = CUSIPB;
if N= 2 then CUSIP_MATCHING2  = CUSIPB;
if N= 3 then CUSIP_MATCHING3  = CUSIPB;
if N= 4 then CUSIP_MATCHING4  = CUSIPB;
if N= 5 then CUSIP_MATCHING5  = CUSIPB;
if N= 6 then CUSIP_MATCHING6  = CUSIPB;
if N= 7 then CUSIP_MATCHING7  = CUSIPB;
if N= 8 then CUSIP_MATCHING8  = CUSIPB;
if N= 9 then CUSIP_MATCHING9  = CUSIPB;
if N=10 then CUSIP_MATCHING10 = CUSIPB;
if N=11 then CUSIP_MATCHING11 = CUSIPB;
if N=12 then CUSIP_MATCHING12 = CUSIPB;
if N=12 or last.WBAHR1HAT_RANK then output;
end;
run;

 

Please spend the requitred time crafting your question to ensure you have a clear and comprehensive example next time, to save everybody's time. Smiley Happy

 

Contributor
Posts: 66

Re: Merging two tables by choosing the CLOSEST dates

[ Edited ]

Dear  

 

I am sorry for the inconvenience. I tried to make my question comprehensively but somehow missed the point. I ran your code but got the following error. Also, I need to keep all the sample firms including those which have no matching firm. Please do not mind if I am asking too many questions Smiley Sad

 

ERROR 161-185: No matching DO/SELECT statement.

 

242 data sample;
243 input fyear cusip SIC2 wBAHR1hat_rank;
244 cards;

NOTE: The data set WORK.SAMPLE has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


248 ;
249
250 data control;
251 input fyear cusip SIC2 wBAHR1hat_rank;
252 cards;

NOTE: Invalid data for fyear in line 253 1-5.
NOTE: Invalid data for cusip in line 253 7-11.
NOTE: Invalid data for SIC2 in line 253 13-16.
NOTE: Invalid data for wBAHR1hat_rank in line 253 18-31.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9-
253 fyear cusip SIC2 wBAHR1hat_rank
fyear=. cusip=. SIC2=. wBAHR1hat_rank=. _ERROR_=1 _N_=1
NOTE: The data set WORK.CONTROL has 26 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds


279 ;
280
281 proc sql;
282 create table _V as
283 select a.*
284 ,abs(a.WBAHR1HAT_RANK-b.WBAHR1HAT_RANK) as DIF
285 ,b.CUSIP as CUSIPB
286 from SAMPLE a
287 left join
288 CONTROL b
289 on a.FYEAR = b.FYEAR
290 and a.SIC2 = b.SIC2
291 and b.WBAHR1HAT_RANK >= a.WBAHR1HAT_RANK
292 order by 1,2,3,4,5 ;
NOTE: Invalid (or missing) arguments to the ABS function have caused the function to return a missing
value.
NOTE: Table WORK._V created, with 19 rows and 6 columns.

293 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds


294 proc sort data=_V; by FYEAR CUSIP SIC2 WBAHR1HAT_RANK; run;

NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


295 data test;
296 set _V;
297 retain N 0 CUSIP_MATCHING1-CUSIP_MATCHING12;
298 by FYEAR CUSIP SIC2 WBAHR1HAT_RANK;
299 N=N*^first.WBAHR1HAT_RANK+1;
300 if N= 1 then CUSIP_MATCHING1 = CUSIPB;
301 if N= 2 then CUSIP_MATCHING2 = CUSIPB;
302 if N= 3 then CUSIP_MATCHING3 = CUSIPB;
303 if N= 4 then CUSIP_MATCHING4 = CUSIPB;
304 if N= 5 then CUSIP_MATCHING5 = CUSIPB;
305 if N= 6 then CUSIP_MATCHING6 = CUSIPB;
306 if N= 7 then CUSIP_MATCHING7 = CUSIPB;
307 if N= 8 then CUSIP_MATCHING8 = CUSIPB;
308 if N= 9 then CUSIP_MATCHING9 = CUSIPB;
309 if N=10 then CUSIP_MATCHING10 = CUSIPB;
310 if N=11 then CUSIP_MATCHING11 = CUSIPB;
311 if N=12 then CUSIP_MATCHING12 = CUSIPB;
312 if N=12 or last.WBAHR1HAT_RANK then output;
313 end;
---
161
ERROR 161-185: No matching DO/SELECT statement.

314 run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST may be incomplete. When this step was stopped there were 0
observations and 19 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

 

Contributor
Posts: 66

Re: Merging two tables by choosing the CLOSEST dates

[ Edited ]

I just modified the code in the following way and the error is not showing up. I have added one extra sample firm in my sample file which has no matched sample . After I run the code, I did not get consistent  result. I have included the picture of my output file at the bottom of the post. As you can see, firm 859 in year 2011 should not have any matched firm but my output shows matched firms.  also firm 859 in year 2000 should not have any cusip_matching9 and cusip_matching9.

 

data sample;
input fyear cusip SIC2 wBAHR1hat_rank;
cards;
1990 123 12 12
2000 859 11 15
2001 859 11 15
;

data control;
input fyear cusip SIC2 wBAHR1hat_rank;
cards;
1990 124 12 13
1990 125 12 14
1990 126 12 15
1990 127 12 16
1990 128 12 17
1990 129 12 18
1990 759 13 19
1990 760 15 20
1990 761 12 21
1990 762 12 22
1990 763 12 23
1990 764 12 24
2000 860 11 21
2000 861 11 12
2000 863 11 23
2000 864 11 13
2000 867 11 14
2000 868 11 16
2000 869 11 17
2000 870 12 18
2000 872 14 19
2000 873 11 20
2000 874 11 21
2000 876 11 22
2000 877 11 23
;

proc sql;
create table _V as
select a.*
,abs(a.WBAHR1HAT_RANK-b.WBAHR1HAT_RANK) as DIF
,b.CUSIP as CUSIPB
from SAMPLE a
left join
CONTROL b
on a.FYEAR = b.FYEAR
and a.SIC2 = b.SIC2
and b.WBAHR1HAT_RANK >= a.WBAHR1HAT_RANK
order by 1,2,3,4,5 ;
quit;
proc sort data=_V; by FYEAR CUSIP SIC2 WBAHR1HAT_RANK; run;
data test;
set _V;
retain N 0 CUSIP_MATCHING1-CUSIP_MATCHING12;
by FYEAR CUSIP SIC2 WBAHR1HAT_RANK;
N=N*^first.WBAHR1HAT_RANK+1;
if N= 1 then CUSIP_MATCHING1 = CUSIPB;
if N= 2 then CUSIP_MATCHING2 = CUSIPB;
if N= 3 then CUSIP_MATCHING3 = CUSIPB;
if N= 4 then CUSIP_MATCHING4 = CUSIPB;
if N= 5 then CUSIP_MATCHING5 = CUSIPB;
if N= 6 then CUSIP_MATCHING6 = CUSIPB;
if N= 7 then CUSIP_MATCHING7 = CUSIPB;
if N= 8 then CUSIP_MATCHING8 = CUSIPB;
if N= 9 then CUSIP_MATCHING9 = CUSIPB;
if N=10 then CUSIP_MATCHING10 = CUSIPB;
if N=11 then CUSIP_MATCHING11 = CUSIPB;
if N=12 then CUSIP_MATCHING12 = CUSIPB;
if N=12 or last.WBAHR1HAT_RANK then do;
output;
end;
run;

 

1.PNG2.PNG

PROC Star
Posts: 1,561

Re: Merging two tables by choosing the CLOSEST dates

Haha just to keep you on your toes...  Smiley Wink

Good on you fixing the error!

You could also just have removed the troublesome end statement.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 349 views
  • 2 likes
  • 2 in conversation