Dear all,
I wish match each firm-year (tic-fyear) observation with another from the same two-digit SIC code (sic2) and year (fyear) with the closest return on assets in the current year, ROA1. Then, I wish to obtain a performance-matched AEM (ie Jones_PDA) for all firm-years, by subtracting the AEM of the firm with the closest ROA that is in the same industry as the firm.
The closest ROA = min(abs(a.ROA1, -b.ROA1)). The lowest difference between ROA1 would be defined as closest ROA1.
I ran a sql using the following program, however, it provides Jones_PDA for all the matched firms. I only wish to have the Jones_PDA for the matched firm with closest ROA instead. It does mean that I wish to have Jones_PDA for only ONE matched firm.
proc sql;
create table Jones_matchedROA1 as
select a.tic,a.fyear,a.sic2,
min(abs(a.ROA1, -b.ROA1)) as closest_ROA1,
sum(a.aem,-b.aem) as Jones_PDA
from geog.data a, geog.data b
where a.tic ne b.tic and a.fyear=b.fyear and a.sic2=b.sic2
group by a.tic,a.fyear,a.sic2
order by a.tic,a.fyear;
quit;
Could anyone guide me on how to modify the program/code to achieve my desired output?
Thank you in advance for advices.
Regards,
mspak
I can't open your attachment. Maybe it is corroded.
And What outpu do you want ?
If you only need to remove replicated observations. Proc sort + nodupkey is a good tool.
Ksharp
Thanks Ksharp,
I changed the file name as "mydata" for this new file. I have the following variables:
1. tic - company identification code
2. fyear - financial year
3. ROA1 - performance measure, returns on assets
4. SIC2 - 2-digit SIC industry classification code
5. AEM - it is a residual as an output from proc reg
I wish to have an output file with the following columns:
1. tic
2. fyear
3. Jones_PDA
Jones_PDA is the difference between the firm's AEM with the another matched firm in the following criteria:
a) same 2-digit SIC code
b) same fyear
c) with the closest ROA1.
The lowest difference between ROA1 of the firm itself and the matched firm would be defined as closest ROA1. As such, I calculate the difference as = min(abs(a.ROA1, -b.ROA1))
There are many firms can be matched with the firm based on SIC code and fyear. However, I would wish to restrict the ONLY ONE matched firm, that is, the one with the lowest difference in terms of their ROA1. In order to avoid self-matched, so I provide a restriction a.tic ne b.tic.
Finally, I would wish to calculate the differences in AEM between the firm and the matched firm (ie. Jones_PDA).
I hope this make my points clearer.
Thank you in advance.
mspak
Hi again,
One correction:
The lowest difference between ROA1 of the firm itself and the matched firm would be defined as closest ROA1. As such, I calculate the difference as = min(abs(sum(a.ROA1, -b.ROA1))).
1. the difference between ROA1
2. as the difference can be positive and negative, so I used ABS (absolute value).
3. In order to further restrict to only firm, so I should choose the lowest value.
Thanks again.
mspak
Hi all,
This is the file. I tried to upload the file from my notebook's desktop instead of my hard disk. I am not sure whether it works.
mspak
Hi all,
I tried to open this file. This data can be opened.
Thank you.
Regards,
MEI SEN
OK. MEI SEN
I noticed a problem in your code.
min(abs(a.ROA1-b.ROA1)) as closest_ROA1,
sum(a.aem-b.aem) as Jones_PDA
You don't need a comma between them.
If you remove comma, maybe the result is what you need.
Ksharp
Thanks Ksharp again,
As I used sum() function, so I include comma between them. If I remove the comma, the following error message appeared:
ERROR: Summary functions nested in this way are not supported.
I used SUM instead of operator of MINUS (ie. -) as SUM will provide a value even there is missing value either on a.ROA1 or b.ROA1.
Regards,
Mspak
Hi Ksharp,
I ran the following program, it seems leading to my desired output:
proc sql;
create table Jones_matchedROA1 as
select unique a.tic,a.fyear,a.sic2,a.aem, a.roa1, b.roa1,
abs(sum(a.ROA1, -b.ROA1))as DIFF_ROA1, min(calculated DIFF_ROA1)as CLOSEST_ROA1,
sum(a.aem,-b.aem)as Jones_PDA
from geog.mydata a, geog.mydata b
where a.tic ne b.tic and a.fyear=b.fyear and a.sic2=b.sic2
group by a.tic,a.fyear,a.sic2
having DIFF_ROA1=CLOSEST_ROA1;
quit;
Only ONE observation per firm-year will be derived by providing "having DIFF_ROA1=CLOSEST_ROA1" clause, that is the one with lowest difference.
Can anyone confirm this is correct?
Thank you very much.
Regards,
MEI SEN
En....
You code looks good.
having DIFF_ROA1=CLOSEST_ROA1;
Only pick up the observation which has the minimize value for each group.
Ksharp
Thanks Ksharp and other experts' help,
I have learned a lot from the discussion in SAS forum.
Regards,
mspak
Hi,
Good afternoon. I am working on the same replication paper: SP Kothari 2005. Could you tell me if the you have encountered the problem that a performance matched firm can bot be obtained? I mean I used your code with the modification of Ksharp. I did not find any missing var for the ***PDAs.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.