Re: Matched observations

mspak · Posted 04-11-2012 03:47 AM

Dear all,

I wish match each firm-year (tic-fyear) observation with another from the same two-digit SIC code (sic2) and year (fyear) with the closest return on assets in the current year, ROA1. Then, I wish to obtain a performance-matched AEM (ie Jones_PDA) for all firm-years, by subtracting the AEM of the firm with the closest ROA that is in the same industry as the firm.

The closest ROA = min(abs(a.ROA1, -b.ROA1)). The lowest difference between ROA1 would be defined as closest ROA1.

I ran a sql using the following program, however, it provides Jones_PDA for all the matched firms. I only wish to have the Jones_PDA for the matched firm with closest ROA instead. It does mean that I wish to have Jones_PDA for only ONE matched firm.

proc sql;

create table Jones_matchedROA1 as

select a.tic,a.fyear,a.sic2,

min(abs(a.ROA1, -b.ROA1)) as closest_ROA1,

sum(a.aem,-b.aem) as Jones_PDA

from geog.data a, geog.data b

where a.tic ne b.tic and a.fyear=b.fyear and a.sic2=b.sic2

group by a.tic,a.fyear,a.sic2

order by a.tic,a.fyear;

quit;

Could anyone guide me on how to modify the program/code to achieve my desired output?

Thank you in advance for advices.

Regards,

mspak

Ksharp · Posted 04-11-2012 04:12 AM

I can't open your attachment. Maybe it is corroded.

And What outpu do you want ?

If you only need to remove replicated observations. Proc sort + nodupkey is a good tool.

Ksharp

mspak · Posted 04-11-2012 05:37 AM

Thanks Ksharp,

I changed the file name as "mydata" for this new file. I have the following variables:

1. tic - company identification code

2. fyear - financial year

3. ROA1 - performance measure, returns on assets

4. SIC2 - 2-digit SIC industry classification code

5. AEM - it is a residual as an output from proc reg

I wish to have an output file with the following columns:

1. tic

2. fyear

3. Jones_PDA

Jones_PDA is the difference between the firm's AEM with the another matched firm in the following criteria:

a) same 2-digit SIC code

b) same fyear

c) with the closest ROA1.

The lowest difference between ROA1 of the firm itself and the matched firm would be defined as closest ROA1. As such, I calculate the difference as = min(abs(a.ROA1, -b.ROA1))

There are many firms can be matched with the firm based on SIC code and fyear. However, I would wish to restrict the ONLY ONE matched firm, that is, the one with the lowest difference in terms of their ROA1. In order to avoid self-matched, so I provide a restriction a.tic ne b.tic.

Finally, I would wish to calculate the differences in AEM between the firm and the matched firm (ie. Jones_PDA).

I hope this make my points clearer.

Thank you in advance.

mspak

mspak · Posted 04-11-2012 05:43 AM

Hi again,

One correction:

The lowest difference between ROA1 of the firm itself and the matched firm would be defined as closest ROA1. As such, I calculate the difference as = min(abs(sum(a.ROA1, -b.ROA1))).

1. the difference between ROA1

2. as the difference can be positive and negative, so I used ABS (absolute value).

3. In order to further restrict to only firm, so I should choose the lowest value.

Thanks again.

mspak

mspak · Posted 04-11-2012 10:46 AM

Hi all,

This is the file. I tried to upload the file from my notebook's desktop instead of my hard disk. I am not sure whether it works.

mspak

mspak · Posted 04-11-2012 07:29 PM

Hi all,

I tried to open this file. This data can be opened.

Thank you.

Regards,

MEI SEN

Ksharp · Posted 04-12-2012 02:03 AM

OK. MEI SEN

I noticed a problem in your code.

min(abs(a.ROA1-b.ROA1)) as closest_ROA1,

sum(a.aem-b.aem) as Jones_PDA

You don't need a comma between them.

If you remove comma, maybe the result is what you need.

Ksharp

mspak · Posted 04-12-2012 02:28 AM

Thanks Ksharp again,

As I used sum() function, so I include comma between them. If I remove the comma, the following error message appeared:

ERROR: Summary functions nested in this way are not supported.

I used SUM instead of operator of MINUS (ie. -) as SUM will provide a value even there is missing value either on a.ROA1 or b.ROA1.

Regards,

Mspak

mspak · Posted 04-12-2012 02:06 AM

Hi Ksharp,

I ran the following program, it seems leading to my desired output:

proc sql;

create table Jones_matchedROA1 as

select unique a.tic,a.fyear,a.sic2,a.aem, a.roa1, b.roa1,

abs(sum(a.ROA1, -b.ROA1))as DIFF_ROA1, min(calculated DIFF_ROA1)as CLOSEST_ROA1,

sum(a.aem,-b.aem)as Jones_PDA

from geog.mydata a, geog.mydata b

where a.tic ne b.tic and a.fyear=b.fyear and a.sic2=b.sic2

group by a.tic,a.fyear,a.sic2

having DIFF_ROA1=CLOSEST_ROA1;

quit;

Only ONE observation per firm-year will be derived by providing "having DIFF_ROA1=CLOSEST_ROA1" clause, that is the one with lowest difference.

Can anyone confirm this is correct?

Thank you very much.

Regards,

MEI SEN

Ksharp · Posted 04-12-2012 05:01 AM

En....

You code looks good.

having DIFF_ROA1=CLOSEST_ROA1;

Only pick up the observation which has the minimize value for each group.

Ksharp

mspak · Posted 04-12-2012 07:48 AM

Thanks Ksharp and other experts' help,

I have learned a lot from the discussion in SAS forum.

Regards,

mspak

fafrin420 · Posted 12-21-2017 01:27 PM

Hi,

Good afternoon. I am working on the same replication paper: SP Kothari 2005. Could you tell me if the you have encountered the problem that a performance matched firm can bot be obtained? I mean I used your code with the modification of Ksharp. I did not find any missing var for the ***PDAs.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away