Help using Base SAS procedures

Match test firms with control firms based on industry.

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 96
Accepted Solution

Match test firms with control firms based on industry.


I have two samples of firm- test sample and control sample. I want to find a match for my test sample from the control sample from the same industry. I also want to make sure that if there are two firms in my test sample from same industry, then I need two distinct firms from control sample for matching.

For instance my test sample looks like this:

data sample;

year firm $ industry;

cards;

2000     A     21

2000           B           21

2001     A     21

2002     A     21

2002     C     50

2002     D     60

;

RUN;

The Control sample looks like the following:

    

data control;

year firm $ industry;

cards;

2000    m      21

2000          n               21

2000     o     21

2000     p     21

2001     q     21

2001     r     21

2002     s     21

2002     t     21

2002     u     21

2002      v    50

2002     w     50

2002     x     60

;

RUN;

The final sample (match) should look like:

year  sample  industry     control  

2000     A     21           m

2000           B           21                        n

2001     A     21           q

2002     A     21           s

2002     C     50           v

2002     D     60           x

Thank you in advance.


Accepted Solutions
Solution
‎04-30-2012 01:40 AM
Trusted Advisor
Posts: 1,128

Re: Match test firms with control firms based on industry.

Hi Shalmali,

Please check the below code. Hope it give the desired output.

proc sort data=sample;

by year;

run;

proc sort data=control;

by year;

run;

data test_merge;

merge sample (in=A) control(in=B rename=(firm=control));

by year;

if A and B;

if first.year or last.year then output;

run;

Thanks,

Jag

Thanks,
Jag

View solution in original post


All Replies
Solution
‎04-30-2012 01:40 AM
Trusted Advisor
Posts: 1,128

Re: Match test firms with control firms based on industry.

Hi Shalmali,

Please check the below code. Hope it give the desired output.

proc sort data=sample;

by year;

run;

proc sort data=control;

by year;

run;

data test_merge;

merge sample (in=A) control(in=B rename=(firm=control));

by year;

if A and B;

if first.year or last.year then output;

run;

Thanks,

Jag

Thanks,
Jag
Frequent Contributor
Posts: 96

Re: Match test firms with control firms based on industry.

Hi Jag,

The program is not giving me the desired results. There are duplicate variables. The output from the program looks like this:

Obs    year    firm    industry    control

1     2000     A         21          m

2     2000     B         21          p

3     2001     A         21          q

4     2001     A         21          r

5     2002     A         21          s

6     2002     D         60          x

I want the matched sample to look like this:

The final sample (match) should look like:

year  sample  industry     control  

2000     A     21           m

2000           B           21                        n

2001     A     21           q

2002     A     21           s

2002     C     50           v

2002     D     60           x

Is there anything to resolve this?

Thank you.

Trusted Advisor
Posts: 1,128

Re: Match test firms with control firms based on industry.

Hi Shalmali,

Please check the below code. It gives the exact output you are looking for. Thanks to TD for his valuable code. It helped me to correct my code and reproduce the desired output using the merge statement.

data sample;

set sample;

by year industry;

if first.industry then count=0;

count+1;

run;

data control;

set control;

by year industry;

if first.industry then count=0;

count+1;

run;

proc sort data=sample;

by year industry count;

run;

proc sort data=control;

by year industry count;

run;

data _merge (drop=count);

merge sample(in=a) control(in=b rename=(firm=control));

by year industry count;

if a and b;

run;


Thanks,

Jag

Thanks,
Jag
Occasional Contributor TD
Occasional Contributor
Posts: 13

Re: Match test firms with control firms based on industry.

My solution:

data sample;
input year firm $ industry;
cards;
2000     A     21
2000     B     21
2001     A     21
2002     A     21
2002     C     50
2002     D     60
;
RUN;

data control;
input year control $ industry;
cards;
2000    m      21
2000    n      21
2000     o     21
2000     p     21
2001     q     21
2001     r     21
2002     s     21
2002     t     21
2002     u     21
2002      v    50
2002     w     50
2002     x     60
;
RUN;

proc sort data=control;
    by year industry control;
run;

data control;
    set control;
    by year industry control;
    retain dup;
    if first.industry then dup = 0;
    dup = dup + 1;
run;

proc sort data=sample;
   by year industry firm;
run;

data sample;
    set sample;
    by year industry firm;
    retain dup;
    if first.industry then dup = 0;
    dup = dup + 1;
run;

data match;
    if _n_ eq 1 then do;
        if 0 then set sample control;
        declare hash hc(dataset:'control');
        hc.defineKey('year', 'industry', 'dup');
        hc.defineData('control');
        hc.defineDone();
    end;

    set sample;
    drop rc dup;
    rc = hc.find();

run;

Respected Advisor
Posts: 3,124

Re: Match test firms with control firms based on industry.

Actually, given the data as is, it can be done just by using one simple merge operation:

data sample;

input

year firm $ industry;

cards;

2000 A 21

2000 B 21

2001 A 21

2002 A 21

2002 C 50

2002 D 60

;

RUN;

data control;

input year firm $ industry;

cards;

2000 m 21

2000 n 21

2000 o 21

2000 p 21

2001 q 21

2001 r 21

2002 s 21

2002 t 21

2002 u 21

2002 v 50

2002 w 50

2002 x 60

;

RUN;

data want;

merge sample control (rename=firm=control_firm);

by year industry ;

if not missing (firm) then output;

call missing(firm);

run;

proc print;run;

Regards,

Haikuo

Frequent Contributor
Posts: 96

Re: Match test firms with control firms based on industry.

Thank you all for the codes. I really appreciate for helping me out.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 630 views
  • 10 likes
  • 4 in conversation