Help using Base SAS procedures

Matched observations

Reply
Regular Contributor
Posts: 162

Matched observations

Dear all,

I wish match each firm-year (tic-fyear) observation with another from the same two-digit SIC code (sic2) and year (fyear) with the closest return on assets in the current year, ROA1. Then, I wish to obtain a performance-matched AEM (ie Jones_PDA) for all firm-years,  by subtracting the AEM of the firm with the closest ROA that is in the same industry as the firm.

The closest ROA = min(abs(a.ROA1, -b.ROA1)). The lowest difference between ROA1 would be defined as closest ROA1.

I ran a sql using the following program, however, it provides Jones_PDA for all the matched firms. I only wish to have the Jones_PDA for the matched firm with closest ROA instead. It does mean that I wish to have Jones_PDA for only ONE matched firm.

proc sql;

create table Jones_matchedROA1 as

select a.tic,a.fyear,a.sic2,

min(abs(a.ROA1, -b.ROA1)) as closest_ROA1,

sum(a.aem,-b.aem) as Jones_PDA

from geog.data a, geog.data b

where a.tic ne b.tic and a.fyear=b.fyear and a.sic2=b.sic2

group by a.tic,a.fyear,a.sic2

order by a.tic,a.fyear;

quit;

Could anyone guide me on how to modify the program/code to achieve my desired output?

Thank you in advance for advices.

Regards,

mspak

Attachment
Super User
Posts: 9,687

Re: Matched observations

I can't open your attachment. Maybe it is corroded.

And What outpu do you want ?

If you only need to remove replicated observations. Proc sort + nodupkey is a good tool.

Ksharp

Regular Contributor
Posts: 162

Re: Matched observations

Thanks Ksharp,

I changed the file name as "mydata" for this new file. I have the following variables:

1. tic - company identification code

2. fyear - financial year

3. ROA1 - performance measure, returns on assets

4. SIC2 - 2-digit SIC industry classification code

5. AEM - it is a residual as an output from proc reg

I wish to have an output file with the following columns:

1. tic

2. fyear

3. Jones_PDA

Jones_PDA is the difference between the firm's AEM with the another matched firm in the following criteria:

a) same 2-digit SIC code

b) same fyear

c) with the closest ROA1.

The lowest difference between ROA1 of the firm itself and the matched firm would be defined as closest ROA1. As such, I calculate the difference as = min(abs(a.ROA1, -b.ROA1))

There are many firms can be matched with the firm based on SIC code and fyear. However, I would wish to restrict the ONLY ONE matched firm, that is, the one with the lowest difference in terms of their ROA1. In order to avoid self-matched, so I provide a restriction a.tic ne b.tic.

Finally, I would wish to calculate the differences in AEM between the firm and the matched firm (ie. Jones_PDA).

I hope this make my points clearer.

Thank you in advance.

mspak

Attachment
Regular Contributor
Posts: 162

Re: Matched observations

Hi again,

One correction:

The lowest difference between ROA1 of the firm itself and the matched firm would be defined as closest ROA1. As such, I calculate the difference as = min(abs(sum(a.ROA1, -b.ROA1))).

1. the difference between ROA1

2. as the difference can be positive and negative, so I used ABS (absolute value).

3. In order to further restrict to only firm, so I should choose the lowest value.

Thanks again.

mspak

Regular Contributor
Posts: 162

Re: Matched observations

Hi all,

This is the file. I tried to upload the file from my notebook's desktop instead of my hard disk. I am not sure whether it works.

mspak

Attachment
Regular Contributor
Posts: 162

Re: Matched observations

Hi all,

I tried to open this file. This data can be opened.

Thank you.

Regards,

MEI SEN

Super User
Posts: 9,687

Re: Matched observations

OK. MEI SEN

I noticed a problem in your code.

min(abs(a.ROA1-b.ROA1)) as closest_ROA1,

sum(a.aem-b.aem) as Jones_PDA

You don't need a comma between them.

If you remove comma, maybe the result is what you need.

Ksharp

Regular Contributor
Posts: 162

Re: Matched observations

Thanks Ksharp again,

As I used sum() function, so I include comma between them. If I remove the comma, the following error message appeared:

ERROR: Summary functions nested in this way are not supported.

I used SUM instead of operator of MINUS (ie. -) as SUM will provide a value even there is missing value either on a.ROA1 or b.ROA1.

Regards,

Mspak

Regular Contributor
Posts: 162

Re: Matched observations

Hi Ksharp,

I ran the following program, it seems leading to my desired output:

proc sql;

create table Jones_matchedROA1 as

select unique a.tic,a.fyear,a.sic2,a.aem, a.roa1, b.roa1,

abs(sum(a.ROA1, -b.ROA1))as DIFF_ROA1, min(calculated DIFF_ROA1)as CLOSEST_ROA1,

sum(a.aem,-b.aem)as Jones_PDA

from geog.mydata a, geog.mydata b

where a.tic ne b.tic and a.fyear=b.fyear and a.sic2=b.sic2

group by a.tic,a.fyear,a.sic2

having DIFF_ROA1=CLOSEST_ROA1;

quit;

Only ONE observation per firm-year will be derived by providing "having DIFF_ROA1=CLOSEST_ROA1" clause, that is the one with lowest difference.

Can anyone confirm this is correct? 

Thank you very much.

Regards,

MEI SEN

Super User
Posts: 9,687

Re: Matched observations

En....

You code looks good.

having DIFF_ROA1=CLOSEST_ROA1;

Only pick up the observation which has the minimize value for each group.

Ksharp

Regular Contributor
Posts: 162

Re: Matched observations

Thanks Ksharp and other experts' help,

I have learned a lot from the discussion in SAS forum.

Smiley HappySmiley HappySmiley Happy

Regards,

mspak

Ask a Question
Discussion stats
  • 10 replies
  • 1253 views
  • 0 likes
  • 2 in conversation