BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I am having a problem with getting a code for my event study.

1) I have two datasets, Universe and Sample.
-Universe dataset contains ALL companies quarters.
-Sample dataset contains merger & acquisition acquirers and their DEAL DATES.
2) I want to combine the two datasets as: for each acquirer, I want the data of 4 quarters BEFORE the DEAL DATE. (i.e. the last 4 observations before the M&A date).

Malek
7 REPLIES 7
Cynthia_sas
SAS Super FREQ
Hi:
A bit more information is needed. For example...

With the Universe dataset -- how is the quarter information represented
--with a separate year variable and a separate quarter variable
--with a single date variable
--is the date information in numeric or character form

With the Sample dataset -- how is the DEAL Date represented
--with a separate year variable, month variable and day variable
--with a single date variable
--is the date information in numeric or character form

What is the common variable or variables for joining, combining or merging the 2 datasets together???

When you say you are looking for the last 4 observations before the M&A date, it sounds like every quarter is a separate observation in the Universe file. Is this true?

Can you show the program or SQL query that you have tried?

Can you show a sample of the data from the two datasets or make up some dummy data that shows the structure of the 2 files?

For posting data or maintaining the indention of code, this previous forum posting is very helpful:
http://support.sas.com/forums/thread.jspa?messageID=27609毙

cynthia
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Additional question back to the OP (unless Cynthia already asked it):

-- how do you correlate/identify acquired companies and who acquired them? Is there some variable/column containing "Acquired_By"?

Definitely need to see more input data descriptive information, as in file/column layout and a sample subset of data values from the OP, submitted as a post/reply using PASTE (preferred over hand-keyed for accuracy). Of course, any SAS code already written by the OP would be helpful.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
I tried a code but I believe it is NOT effecient at all!!
Anyway, here it is:

First, I created two new variables:
1) in UNIVERSE dataset I created "agvkey_datadate" which is equal to [(agvkey*100000)+datadate after converted into general number on Excel]
2) in SAMPLE dataset I created "agvkey_dealdate" which is equal to [(agvkey*100000)+dealdate after converted into general number on Excel].

Second, I did this code starting by merging as follows:

[pre]
*merging UNIVERSE with SAMPLE file;
data FINAL;
merge UNIVERSE SAMPLE;
by agvkey;
run;
*CREATING four separate quarters datasets;
*and cleaning the merged file to get the SAMPLE data 1 quarter before dealdate;
data SAMPLE_Q1;
set FINAL;
if (agvkey_dealdate-90)>agvkey_datadate then delete;
if agvkey_datadate>agvkey_dealdate then delete;
run;
*removing duplicates for the near cutoff points in Q1;
data SAMPLE_Q1;
set SAMPLE_Q1;
if agvkey_datadate=2006437621 or
agvkey_datadate=13760038748 then delete;
run;

*cleaning the merged file to get the SAMPLE data 2 quarters before dealdate;
data SAMPLE_Q2;
set FINAL;
if (agvkey_dealdate-184)>agvkey_datadate then delete;
if agvkey_datadate>(agvkey_dealdate-90) then delete;
run;
*removing Duplicates for the near cutoff points in Q2;
data SAMPLE_Q2;
set SAMPLE_Q2;
if agvkey_datadate=597336160 or
..........etc
agvkey_datadate=187839386
then delete;
run;
*cleaning the merged file to get the SAMPLE data 3 quarters before dealdate;
data SAMPLE_Q3;
set FINAL;
if (agvkey_dealdate-276)>agvkey_datadate then delete;
if agvkey_datadate>(agvkey_dealdate-182) then delete;
run;
*removing Duplicates for the near cutoff points in Q3;
data SAMPLE_Q3;
set SAMPLE_Q3;
if agvkey_datadate=2254635915 or
.........etc
agvkey_datadate=187839294
then delete;
run;
*cleaning the merged file to get the SAMPLE data 2 quarters before dealdate;
data SAMPLE_Q4;
set FINAL;
if (agvkey_dealdate-366)>agvkey_datadate then delete;
if agvkey_datadate>(agvkey_dealdate-273) then delete;
run;
*removing Duplicates for the near cutoff points in Q4;
data SAMPLE_Q4;
set SAMPLE_Q4;
if agvkey_datadate=6263435885 or
............etc
agvkey_datadate=6308039172
then delete;
run;
[/pre]

As you can see, date calculation in this code is faulty and it can bring TWO quarters to some firms which their deal date is around a cutoff point of calendar quarters.

Malek
deleted_user
Not applicable
Thanks Cynthia;

Universe dataset -- how is the quarter information represented
--with a single date variable. However the table shows two equivelant date representations:
1) "datadate" Character form like 31-Mar-99, and
2)"datacqtr" Numeric form like 1999.1 (means quarter 1 year 1999)


Sample dataset -- how is the DEAL Date represented
--with a single date variable. "dealdate" and it's Numeric like 12DEC2008

Coming to the common variable, both datasets have gvkey to represent the acquirers (I mean firms).

Yes this is true. Each observation is a firm-quarter.

UNIVERSE:
[pre]
Obs gvkey datadate datacqtr sic2 lag4_roa

1 1235 30-Jun-99 1999.2 22 0.01534
2 1235 30-Sep-99 1999.3 22 0.02700
3 1235 31-Mar-99 1999.1 22 0.01663
4 1278 30-Jun-00 2000.2 23 -0.01379
5 1278 30-Jun-99 1999.2 23 -0.09581
6 1278 30-Sep-00 2000.3 23 -0.01427
7 1278 30-Sep-99 1999.3 23 -0.05392
8 1278 31-Dec-99 1999.4 23 -0.09128
9 1278 31-Mar-00 2000.1 23 -0.01787
10 1278 31-Mar-99 1999.1 23 -0.01613
11 1496 30-Jun-00 2000.2 25 0.06021
12 1496 30-Jun-01 2001.2 25 0.05269
13 1496 30-Jun-02 2002.2 25 0.05435
14 1496 30-Jun-03 2003.2 25 0.02923
15 1496 30-Jun-04 2004.2 25 0.02190
16 1496 30-Jun-05 2005.2 25 0.01718
17 1496 30-Jun-06 2006.2 25 -0.01955
18 1496 30-Jun-07 2007.2 25 0.00963
19 1496 30-Jun-08 2008.2 25 -0.00479
20 1496 30-Jun-99 1999.2 25 0.08410
21 1496 30-Sep-00 2000.3 25 0.03563
22 1496 30-Sep-01 2001.3 25 0.03606
23 1496 30-Sep-02 2002.3 25 0.02185
24 1496 30-Sep-03 2003.3 25 0.02684
25 1496 30-Sep-04 2004.3 25 0.01826
26 1496 30-Sep-05 2005.3 25 0.06419
27 1496 30-Sep-06 2006.3 25 -0.03246
28 1496 30-Sep-07 2007.3 25 0.00816
29 1496 30-Sep-08 2008.3 25 -0.03550
30 1496 30-Sep-99 1999.3 25 0.08177
31 1496 31-Dec-00 2000.4 25 0.04078
32 1496 31-Dec-01 2001.4 25 0.06475
33 1496 31-Dec-02 2002.4 25 0.02916
34 1496 31-Dec-03 2003.4 25 0.02524
35 1496 31-Dec-04 2004.4 25 0.02744
36 1496 31-Dec-05 2005.4 25 -0.00787
37 1496 31-Dec-06 2006.4 25 -0.09617
38 1496 31-Dec-07 2007.4 25 -0.00332
39 1496 31-Dec-99 1999.4 25 0.06719
40 1496 31-Mar-00 2000.1 25 0.04244
41 1496 31-Mar-01 2001.1 25 0.03444
42 1496 31-Mar-02 2002.1 25 0.04240
43 1496 31-Mar-03 2003.1 25 0.02642
44 1496 31-Mar-04 2004.1 25 0.01729
45 1496 31-Mar-05 2005.1 25 0.02409
46 1496 31-Mar-06 2006.1 25 -0.26155
47 1496 31-Mar-07 2007.1 25 0.02740
48 1496 31-Mar-08 2008.1 25 -0.00994
49 1496 31-Mar-99 1999.1 25 0.07676
50 1992 30-Jun-00 2000.2 24 0.01341

[/pre]

SAMPLE:
[pre] Obs dealdate Acq gvkey id

1 12DEC2008 International Tech Sys Inc . 1
2 09DEC2008 AO Smith Corp 9771 2
3 01DEC2008 Johnson & Johnson 6266 3
4 24NOV2008 Kratos Defense & Security 126056 4
5 24NOV2008 Johnson & Johnson 6266 5
6 28OCT2008 Clinical Data Inc 3116 6
7 27OCT2008 CenturyTel Inc 2884 7
8 06OCT2008 Eli Lilly & Co 65152 8
9 24SEP2008 Ligand Pharmaceuticals Inc 12495 9
10 22SEP2008 McAfee Inc 14401 10
11 15SEP2008 Best Buy Co Inc 2184 11
12 03SEP2008 Park City Group Inc 116166 13
13 02SEP2008 Teradyne Inc 10453 14
14 22AUG2008 King Pharmaceuticals Inc 112033 16
15 12AUG2008 The Middleby Corp 13570 17
16 12AUG2008 CVS Caremark Corp 28654 18
17 07AUG2008 Stream Global Services Inc 178633 19
18 05AUG2008 Planet Hollywood International 62676 20
19 28JUL2008 Sempra Energy Inc 8272 21
20 21JUL2008 Brocade Commun Sys Inc 120774 22
21 17JUL2008 ON Semiconductor Corp 18914 23
22 16JUL2008 Convergys Corp 149697 24
23 11JUL2008 Ashland Inc 1794 26
24 10JUL2008 Access Pharmaceuticals Inc 12713 27
25 10JUL2008 GSI Group Inc 61356 28
26 09JUL2008 TranSwitch Corp 60846 29
27 08JUL2008 Eli Lilly & Co 122895 30
28 26JUN2008 Liberty Media Corp 174168 31
29 23JUN2008 Republic Services Inc 112168 32
30 19JUN2008 TIBCO Software Inc 122061 34
31 09JUN2008 CytRx Corp 13184 36
32 09JUN2008 Hologic Inc 65908 37
33 03JUN2008 Smith International Inc 9772 38
34 29MAY2008 Blackbaud Inc 260893 39
35 28MAY2008 Bristol-Myers Squibb Co 162502 40
36 27MAY2008 Provide Commerce Inc 156415 41
37 16MAY2008 Finisar Corp 126417 43
38 15MAY2008 CBS Corp 13714 44
39 13MAY2008 Hewlett-Packard Co 4537 45
40 12MAY2008 Comtech Telecommun Corp 3358 46
41 08MAY2008 DG FastChannel Inc 62016 47
42 01MAY2008 Autodesk Inc 1878 48
43 30APR2008 Stone Energy Corp 64835 49
44 30APR2008 United Online Inc 124198 50
45 22APR2008 Medtronic Inc 178526 51
46 21APR2008 Blue Coat Systems Inc 126615 52
47 15APR2008 Boston Scientific Corp 119214 54
48 14APR2008 Delta Air Lines Inc 2605 55
49 11APR2008 Cypress Semiconductor Corp 12215 57
[/pre]

Malek
Cynthia_sas
SAS Super FREQ
Hi:
Is GVKEY character or numeric? Is it the same in both files? It looks, from this sample of data, as though there are no matches in Universe and Sample. What is the desired treatment for the instances where GVKEY is missing (as you show in OBS 1 in the SAMPLE data?

Is this the correct logic:
If there are fewer than 4 quarters when GVKEY matches will you want ALL the quarters??? If there are more than 4 quarters when GVKEY matches, you only want the 4 quarters before DEAL DATE -- is that INCLUDING the quarter that DEAL DATE is in or the 4 quarters BEFORE DEAL DATE??? For example, if DEAL DATE is Jan 15, 2000, that's 2000.1 -- would you want:
Option 1:
2009.2, 2009.3, 2009.4 and 2000.1
or
Option 2:
2009.1, 2009.2, 2009.3 and 2009.4

When you say that DEALDATE is numeric, do you mean that it is a SAS date value. so that if you look at 12DEC2008 without a date format, you see a number like: 17878 (which is how SAS would internally represent 12DEC2008)?

What code have you already tried? Have you tried an SQL join? Have you tried a Data step merge?

With either approach, if you were going to use any of the SAS date functions to help in your selection, your character date strings would need to be converted to numeric date values (using the INPUT function). Then you could do interval calculations usign the INTNX function. Seeing the code you've tried would be useful -- do you have a preferred method...SQL vs DATA step???

cynthia
deleted_user
Not applicable
Hey,

GVKEY is numeric and it is the same in both files. It is a company identifier.

There are no matches between the files because I cropped some of the UNIVERSE but here with me a full file contains ALL companies/GVKEYs. While SAMPLE contains ONLY some selected GVKEYs.

The logic you are suggesting is "nearly" correct:
-If there are fewer than 4 quarters when GVKEY matches will you want ALL the quarters??? IF the quarters older than dealdate.
-If there are more than 4 quarters when GVKEY matches, you only want the 4 quarters before DEAL DATE. YES that's correct.
- and this is EXCLUDING the quarter of the deal date.
For example: if DEAL DATE is Jan 15 2000, that's 2000.1 then I need the data that have DATA DATE:
1999.4, 1999.3, 1999.2, 1999.1.

DEALDATE is numeric as a SAS date format (DATE9.).

The code I tried is posted up already. (the stupid and ineffecient one!).

I prefer DATA step because I'm a beginner on SAS.

Loads of thanks
Malek.
Cynthia_sas
SAS Super FREQ
Hi:
I believe that your approach might make a few more passes through the data than is automatically necessary. Let's deal with some "dummy" data. For example, if you had a file like this, you could easily see which 4 observations to "grab" for each GVKEY. In the fake data below, I made some dummy deal dates:
[pre]
gvkey dealdate obsdate qtrdeal qtrobs status gotone

1235 31OCT1999 30SEP1999 1999.4 1999.3 LT 1
1235 31OCT1999 30JUN1999 1999.4 1999.2 LT 2
1235 31OCT1999 31MAR1999 1999.4 1999.1 LT 3
**********************************************************************************
1278 15OCT2000 30SEP2000 2000.4 2000.3 LT 1
1278 15OCT2000 30JUN2000 2000.4 2000.2 LT 2
1278 15OCT2000 31MAR2000 2000.4 2000.1 LT 3
1278 15OCT2000 31DEC1999 2000.4 1999.4 LT 4
1278 15OCT2000 30SEP1999 2000.4 1999.3 LT 5
1278 15OCT2000 30JUN1999 2000.4 1999.2 LT 6
1278 15OCT2000 31MAR1999 2000.4 1999.1 LT 7
**********************************************************************************
1496 15SEP2006 30SEP2008 2006.3 2008.3 GT 0
1496 15SEP2006 30JUN2008 2006.3 2008.2 GT 0
1496 15SEP2006 31MAR2008 2006.3 2008.1 GT 0
1496 15SEP2006 31DEC2007 2006.3 2007.4 GT 0
1496 15SEP2006 30SEP2007 2006.3 2007.3 GT 0
1496 15SEP2006 30JUN2007 2006.3 2007.2 GT 0
1496 15SEP2006 31MAR2007 2006.3 2007.1 GT 0
1496 15SEP2006 31DEC2006 2006.3 2006.4 GT 0
1496 15SEP2006 30SEP2006 2006.3 2006.3 EQ 0
1496 15SEP2006 30JUN2006 2006.3 2006.2 LT 1
1496 15SEP2006 31MAR2006 2006.3 2006.1 LT 2
1496 15SEP2006 31DEC2005 2006.3 2005.4 LT 3
1496 15SEP2006 30SEP2005 2006.3 2005.3 LT 4
1496 15SEP2006 30JUN2005 2006.3 2005.2 LT 5
1496 15SEP2006 31MAR2005 2006.3 2005.1 LT 6
.... more observations for GVKEY 1496 ....
[/pre]

I believe that if the UNIVERSE data is sorted in descending order and you create some extra variables, you can create this type of file with one merge.

The program below uses your UNIVERSE data, but creates some SAMPLE data designed to match only 1235, 1278 and 1496.

Then, in the MERGE step, the merge is done using IN= special variables, this allows a test to be done to make sure that any observation has a match in both files. When you are doing a one to many merge, or you need to control the creation of variables or the selective output of rows, using the IN= variables can be extremely useful.

You can create 2 new variables -- one that represents the quarter of the dealdate and one that represents the quarter of the obs date. By comparing those 2 dates, you easily know whether the observation date quarter is GT the deal date quarter (you do not want this one); whether the observation date quarter is EQ the deal date quarter (you do not want this one); or whether the observation date is LT the deal date quarter (you only want 4 of these -- but the most recent 4 -- which is why you sort the UNIVERSE data in descending order.

In order to get the most recent 4 quarters, you only need to set up a counter variable (GOTONE in my program). At the first of every group, or the FIRST.GVKEY, you set GOTONE to 0. Then you only need to increment GOTONE when the quarter of the obs date is LT the quarter of the deal date.

You can control the output of observations with an IF statement, by testing that the value of GOTONE has to be 1,2,3 or 4 in order to qualify for OUTPUT. At this point in the program, you already know 2 things:
1) this particular observation matched in both files (because it passed the IN= test)
and
2) you have an observation where quarter of obs date is LT quarter of deal date (with obs date sorted in descending order)

You might need to tweak the logic of this a bit when you use real data for SAMPLE instead of fake data. The final MERGE creates 2 datasets -- the ALLMATCH dataset is the one shown above -- you can use this for verifying that the logic is correct for your real data. the ONLY4QTR dataset contains only the 4 quarters.
cynthia
[pre]

options nodate nonumber nocenter;

data UNIVERSE;
length datadate $9.;
infile datalines;
input gvkey datadate $ datacqtr $ sic2 lag4_roa;
obsdate = input(datadate,anydtdte.);
format obsdate date9.;
return;
datalines;
1235 30-Jun-99 1999.2 22 0.01534
1235 30-Sep-99 1999.3 22 0.02700
1235 31-Mar-99 1999.1 22 0.01663
1278 30-Jun-00 2000.2 23 -0.01379
1278 30-Jun-99 1999.2 23 -0.09581
1278 30-Sep-00 2000.3 23 -0.01427
1278 30-Sep-99 1999.3 23 -0.05392
1278 31-Dec-99 1999.4 23 -0.09128
1278 31-Mar-00 2000.1 23 -0.01787
1278 31-Mar-99 1999.1 23 -0.01613
1496 30-Jun-00 2000.2 25 0.06021
1496 30-Jun-01 2001.2 25 0.05269
1496 30-Jun-02 2002.2 25 0.05435
1496 30-Jun-03 2003.2 25 0.02923
1496 30-Jun-04 2004.2 25 0.02190
1496 30-Jun-05 2005.2 25 0.01718
1496 30-Jun-06 2006.2 25 -0.01955
1496 30-Jun-07 2007.2 25 0.00963
1496 30-Jun-08 2008.2 25 -0.00479
1496 30-Jun-99 1999.2 25 0.08410
1496 30-Sep-00 2000.3 25 0.03563
1496 30-Sep-01 2001.3 25 0.03606
1496 30-Sep-02 2002.3 25 0.02185
1496 30-Sep-03 2003.3 25 0.02684
1496 30-Sep-04 2004.3 25 0.01826
1496 30-Sep-05 2005.3 25 0.06419
1496 30-Sep-06 2006.3 25 -0.03246
1496 30-Sep-07 2007.3 25 0.00816
1496 30-Sep-08 2008.3 25 -0.03550
1496 30-Sep-99 1999.3 25 0.08177
1496 31-Dec-00 2000.4 25 0.04078
1496 31-Dec-01 2001.4 25 0.06475
1496 31-Dec-02 2002.4 25 0.02916
1496 31-Dec-03 2003.4 25 0.02524
1496 31-Dec-04 2004.4 25 0.02744
1496 31-Dec-05 2005.4 25 -0.00787
1496 31-Dec-06 2006.4 25 -0.09617
1496 31-Dec-07 2007.4 25 -0.00332
1496 31-Dec-99 1999.4 25 0.06719
1496 31-Mar-00 2000.1 25 0.04244
1496 31-Mar-01 2001.1 25 0.03444
1496 31-Mar-02 2002.1 25 0.04240
1496 31-Mar-03 2003.1 25 0.02642
1496 31-Mar-04 2004.1 25 0.01729
1496 31-Mar-05 2005.1 25 0.02409
1496 31-Mar-06 2006.1 25 -0.26155
1496 31-Mar-07 2007.1 25 0.02740
1496 31-Mar-08 2008.1 25 -0.00994
1496 31-Mar-99 1999.1 25 0.07676
;
run;

** sort by descending obsdate so most current obsdate is first;
proc sort data=universe;
by gvkey descending obsdate;
run;

** show observations (might want to limit this proc print if a big file;
ods listing;
proc print data=universe;
title 'Universe Companies OBSDATE is character DATADATE converted to numeric';
run;

** read sample data;
** note that dealdate for 1496 is not at the beginning of the range;
** for the fake observations for this company;
data SAMPLE;
length acq $50;
infile datalines;
input dealdate : date9. acq $ gvkey id;
format dealdate date9.;
return;
datalines;
31Oct1999 AAAAAAAAAAAAA 1235 2
15OCT2000 BBBBBBBBBBBBB 1278 3
15SEP2006 CCCCCCCCCCCCC 1496 4
;
run;

proc sort data=sample;
by gvkey dealdate;
run;

proc print data=sample;
title 'Sample Companies To Find in UNIVERSE';
run;

** Create 2 datasets -- one for all obs that match (ALLMATCH);
** and one for only the 4 qtr obs (ONLY4QTR) before the DEALDATE;
data allmatch only4qtr;
retain gotone 0;
length qtrdeal qtrobs $8.;
merge sample(in=ins) universe(in=inu);
by gvkey;
** Subsetting IF only "passes" obs which match in both files;
** to rest of program logic;
if (ins and inu);

** make a quarter variable for dealdate and obsdate;
qtrdeal = put(dealdate,yyqp7.);
qtrobs = put(obsdate,yyqp7.);
** for first gvkey obs, set gotone variable to 0;
if first.gvkey then do;
gotone = 0;
end;
** if qtr of obs is LT qtr of deal, then increment gotone;
** remember that obs data is in descending order;
if qtrobs lt qtrdeal then gotone + 1;

***** for verification purposes only;
** do not need this section of code, after you verify that the logic;
** works correctly. Also, do not need ALLMATCH data -- that is for;
** verification purposes, too.;
if qtrobs lt qtrdeal then status='LT';
else if qtrobs eq qtrdeal then status = 'EQ';
else if qtrobs gt qtrdeal then status = 'GT';
output allmatch;
***** for verification purposes only;

** only want the 4 obs where qtr of obsdate is lt qtr of dealdate;
if 1 le gotone le 4 then output only4qtr;
run;

options ls=100;
proc print data=allmatch;
title 'Match in Both Files, but have ALL OBS';
title2 'look at pattern for qtrdeal and qtrobs variables and value of GOTONE';
var gvkey dealdate obsdate qtrdeal qtrobs status gotone ;
format dealdate obsdate date9.;
run;

proc print data=only4qtr;
title 'Match in Both Files, but only 4 quarters before merger';
title2 'This works because the Universe file is in descending order';
title3 'And you only get the values for GOTONE between 1 and 4';
var gvkey dealdate obsdate qtrdeal qtrobs gotone acq id datacqtr sic2 lag4_roa;
format dealdate obsdate date9.;
run;
[/pre]

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2074 views
  • 0 likes
  • 3 in conversation