Learning SAS? Welcome to the exclusive online community for all SAS learners.

Matching firms by size and industry?

Reply
Occasional Contributor
Posts: 6

Matching firms by size and industry?

I have a sample of 2226 firms(sample1) and i want to find a one to one match for each of these firms from sample2 with the same SIC and where the market_cap is closest to the matching firm and the date is in the same year. I already figured out how to do it allowing for duplicate matches; meaning different firms from sample1 can be matched to the same firm in sample2 but i need one unique match for each of the firms in sample1. the variable in each file are;

sample1: pemno, year, sic, market_cap

sample2Smiley Tongueemno, year ,sic ,market_cap

any help would be very much appreciated.

Respected Advisor
Posts: 4,646

Re: Matching firms by size and industry?

If you have the licence for SAS/OR, optimal assignment can be found with proc optnet, this way:

/* Create some fake data */

data sample1 sample2;

call streaminit(8669);

sic = "AAA";

do pemno = 1, 2, 3;

    do year = 2000, 2001;

        market_cap = 10 * rand("UNIFORM");

        output sample1;

        end;

    end;

do pemno = 10 to 20;

    do year = 2000, 2001;

        market_cap = 10 * rand("UNIFORM");

        output sample2;

        end;

    end;

run;

/* Generate the distances (weight) between possible matches in proc optnet LINKS format */

proc sql;

create table match as

select

    catx("-", S1.sic, S1.year, S1.pemno) as from length=24,

    catx("-", S2.sic, S2.year, S2.pemno) as to length=24,

    abs(S1.market_cap - S2.market_cap) as weight

from

    sample1 as S1 inner join

    Sample2 as S2 on S1.sic=S2.sic and S1.year=S2.year

where calculated weight < 3  /* maximum distance allowed (optional) */

order by from, to;


title "Naive assignment, with possible duplicates";

select * from match group by from

having weight = min(weight);

quit;

/* Find best assignment */

proc optnet links=match direction=directed;

linear_assignment out=assign;

run;

title "Best assignment";

proc print data=assign noobs; run;

PG

PG
Respected Advisor
Posts: 4,646

Re: Matching firms by size and industry?

Added note: Your two samples (Sample1 and Sample2 in the example above) should be disjoint. They should not have any pemno in common.

PG

PG
Occasional Contributor
Posts: 6

Re: Matching firms by size and industry?

i noticed and i got rid of the same permnos at sample 2. the optnet still didnt give me the desired results. i craeated my control sample using a macro. thanks for your help anyway.

Respected Advisor
Posts: 4,646

Re: Matching firms by size and industry?

What did you get? Wrong matches, error messages?

PG
Occasional Contributor
Posts: 6

Re: Matching firms by size and industry?

optnet was returning only one row

Occasional Contributor
Posts: 6

Re: Matching firms by size and industry?

This is the code that gave me the result i wanted.

data litigation;

set litigation;

if market_cap ne '0';

year_litigation=year(filing_date);

run;

proc sort data=litigation;

by permno;

run;

proc sort data=sasuser.monthly;

by permno;

run;

data mix;

merge sasuser.monthly litigation;

by permno;

run;

data control_sample_source;

set mix;

if filing_name=' ';

keep permno date cusip comnam TICKER prc siccd SHROUT;

run;

data control_sample_source;

set control_sample_source;

length sic $3;

sic=substr(left(put(siccd,best12.)),1,3);

drop siccd;

prc=abs(prc);

mkt_cap=prc*(SHROUT*1000);

mkt_cap=mkt_cap/1000000;

run;

quit;

data control_sample_source;

set control_sample_source;

SIC_3 = input(sic,3.0);

DROP SIC;

RENAME PERMNO=PERMNO_CONTROL TICKER=TICKER_CONTROL;

year=year(date);

run;

data control_sample_source;

set control_sample_source;

if mkt_cap ne '.';

run;

data litigation;

set litigation;

source=_n_;

run;

PROC SORT DATA=CONTROL_SAMPLE_SOURCE;

BY PERMNO_CONTROL;

RUN;

data control_sample_source;

set control_sample_source;

IF PERMNO_CONTROL^=LAG(PERMNO_CONTROL) THEN SOURCE_CONTROL+1;

  ELSE SOURCE_CONTROL+0;

  RUN;

proc sql;

create table possmatch as

select litigation.source as caseid, litigation.permno , litigation.filing_date,litigation.Settlement_Announcment, litigation.cus6,

litigation.Header_SIC_Industry_Group as SIC,litigation.Company_Name, litigation.market_cap,litigation.year_litigation,

control_sample_source.source_control as controlid, control_sample_source.permno_control, control_sample_source.date,

control_sample_source.CUSIP,control_sample_source.sic_3, control_sample_source.COMNAM, control_sample_source.mkt_cap,

control_sample_source.year,

abs(litigation.market_cap-control_sample_source.mkt_cap) as sdiff

from litigation, control_sample_source

where litigation.year_litigation=control_sample_source.year AND

litigation.Header_SIC_Industry_Group=control_sample_source.sic_3

order by caseid, controlid,sdiff;

data possmatch;

set possmatch;

by caseid controlid year;

if first.controlid then output;

run;

%MACRO MATCHUP(RESULT, POTENMCH, CASID, CTRLID, DIFF); /*1*/

%LOCAL I J;

%LET I = 0;

%DO %UNTIL (&SQLOBS = 0); /*2*/

%LET I = %EVAL(&I + 1);

PROC SORT DATA = &POTENMCH; BY &CASID &DIFF; /*3*/

DATA BESTMCH; /*4*/

SET &POTENMCH;

BY &CASID;

IF FIRST.&CASID THEN OUTPUT;

PROC SORT DATA = BESTMCH; BY &CTRLID &DIFF; /*5*/

DATA MATCH&I;

SET BESTMCH;

BY &CTRLID;

IF FIRST.&CTRLID THEN OUTPUT;

PROC SQL; /*6*/

CREATE TABLE &POTENMCH AS

SELECT &POTENMCH..*

FROM &POTENMCH

WHERE &CASID NOT IN

(SELECT &CASID

FROM MATCH&I)

AND &CTRLID NOT IN

(SELECT &CTRLID

FROM MATCH&I);

%END;

PROC DATASETS; DELETE &POTENMCH;

DATA &RESULT; /*7*/

SET

%DO J = 1 %TO &I;

MATCH&J

%END;

;

%MEND MATCHUP;

%MATCHUP(MATCHES,POSSMATCH,CASEID,CONTROLID,SDIFF);

Respected Advisor
Posts: 4,646

Re: Matching firms by size and industry?

Your method will most likely find you good matches but not necessarily the best. Consider the following example:

data case control;

input set $ point $ pos;

if set="case"

then output case;

else output control;

drop set;

datalines;

case a 0

case b 3

control r 2

control s 6

;

proc sql;

create table match as

select

    case.point as from,

    control.point as to,

    abs(case.pos-control.pos) as weight

from case, control;

select * from match;

quit;

                            from      to          weight

                            ----------------------------

                            a         r                2

                            a         s                6

                            b         r                1

                            b         s                3

/* Find best assignment */

proc optnet links=match direction=directed;

linear_assignment out=assign;

run;

proc print data=assign noobs; run;

                                from    to    weight

                                 a      r        2

                                 b      s        3

From my understanding of your method, b would be matched with r first, which would leave a to be matched with s. So you would get:

                                from    to    weight

                                 b      r        1

                                 a      s        6

which is not optimal.

PG

PG
Regular Contributor
Posts: 168

Re: Matching firms by size and industry?

how does (case.pos-control.pos) it resolve whilst creating the table 'match'?

Respected Advisor
Posts: 4,646

Re: Matching firms by size and industry?

In my example, abs(case.pos-control.pos) is the measure of discrepancy between a case and a control. It could be replaced with any measure of distance. Optimal assignment tries to minimize the overall discrepancy. In the assignment found by optnet, the overall discrepancy is 5 while for the other assignment it is 7.

PG

PG
Senior User
Posts: 1

Re: Matching firms by size and industry?

What if I have more than 3 controls to match the samples, how I should modify the code "case.point as from , control . point as to,"? Thank you 

Ask a Question
Discussion stats
  • 10 replies
  • 1463 views
  • 0 likes
  • 4 in conversation