BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shalmali
Calcite | Level 5


I have two samples of firm- test sample and control sample. I want to find a match for my test sample from the control sample from the same industry. I also want to make sure that if there are two firms in my test sample from same industry, then I need two distinct firms from control sample for matching.

For instance my test sample looks like this:

data sample;

year firm $ industry;

cards;

2000     A     21

2000           B           21

2001     A     21

2002     A     21

2002     C     50

2002     D     60

;

RUN;

The Control sample looks like the following:

    

data control;

year firm $ industry;

cards;

2000    m      21

2000          n               21

2000     o     21

2000     p     21

2001     q     21

2001     r     21

2002     s     21

2002     t     21

2002     u     21

2002      v    50

2002     w     50

2002     x     60

;

RUN;

The final sample (match) should look like:

year  sample  industry     control  

2000     A     21           m

2000           B           21                        n

2001     A     21           q

2002     A     21           s

2002     C     50           v

2002     D     60           x

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Hi Shalmali,

Please check the below code. Hope it give the desired output.

proc sort data=sample;

by year;

run;

proc sort data=control;

by year;

run;

data test_merge;

merge sample (in=A) control(in=B rename=(firm=control));

by year;

if A and B;

if first.year or last.year then output;

run;

Thanks,

Jag

Thanks,
Jag

View solution in original post

6 REPLIES 6
Jagadishkatam
Amethyst | Level 16

Hi Shalmali,

Please check the below code. Hope it give the desired output.

proc sort data=sample;

by year;

run;

proc sort data=control;

by year;

run;

data test_merge;

merge sample (in=A) control(in=B rename=(firm=control));

by year;

if A and B;

if first.year or last.year then output;

run;

Thanks,

Jag

Thanks,
Jag
shalmali
Calcite | Level 5

Hi Jag,

The program is not giving me the desired results. There are duplicate variables. The output from the program looks like this:

Obs    year    firm    industry    control

1     2000     A         21          m

2     2000     B         21          p

3     2001     A         21          q

4     2001     A         21          r

5     2002     A         21          s

6     2002     D         60          x

I want the matched sample to look like this:

The final sample (match) should look like:

year  sample  industry     control  

2000     A     21           m

2000           B           21                        n

2001     A     21           q

2002     A     21           s

2002     C     50           v

2002     D     60           x

Is there anything to resolve this?

Thank you.

Jagadishkatam
Amethyst | Level 16

Hi Shalmali,

Please check the below code. It gives the exact output you are looking for. Thanks to TD for his valuable code. It helped me to correct my code and reproduce the desired output using the merge statement.

data sample;

set sample;

by year industry;

if first.industry then count=0;

count+1;

run;

data control;

set control;

by year industry;

if first.industry then count=0;

count+1;

run;

proc sort data=sample;

by year industry count;

run;

proc sort data=control;

by year industry count;

run;

data _merge (drop=count);

merge sample(in=a) control(in=b rename=(firm=control));

by year industry count;

if a and b;

run;


Thanks,

Jag

Thanks,
Jag
TD
Calcite | Level 5 TD
Calcite | Level 5

My solution:

data sample;
input year firm $ industry;
cards;
2000     A     21
2000     B     21
2001     A     21
2002     A     21
2002     C     50
2002     D     60
;
RUN;

data control;
input year control $ industry;
cards;
2000    m      21
2000    n      21
2000     o     21
2000     p     21
2001     q     21
2001     r     21
2002     s     21
2002     t     21
2002     u     21
2002      v    50
2002     w     50
2002     x     60
;
RUN;

proc sort data=control;
    by year industry control;
run;

data control;
    set control;
    by year industry control;
    retain dup;
    if first.industry then dup = 0;
    dup = dup + 1;
run;

proc sort data=sample;
   by year industry firm;
run;

data sample;
    set sample;
    by year industry firm;
    retain dup;
    if first.industry then dup = 0;
    dup = dup + 1;
run;

data match;
    if _n_ eq 1 then do;
        if 0 then set sample control;
        declare hash hc(dataset:'control');
        hc.defineKey('year', 'industry', 'dup');
        hc.defineData('control');
        hc.defineDone();
    end;

    set sample;
    drop rc dup;
    rc = hc.find();

run;

Haikuo
Onyx | Level 15

Actually, given the data as is, it can be done just by using one simple merge operation:

data sample;

input

year firm $ industry;

cards;

2000 A 21

2000 B 21

2001 A 21

2002 A 21

2002 C 50

2002 D 60

;

RUN;

data control;

input year firm $ industry;

cards;

2000 m 21

2000 n 21

2000 o 21

2000 p 21

2001 q 21

2001 r 21

2002 s 21

2002 t 21

2002 u 21

2002 v 50

2002 w 50

2002 x 60

;

RUN;

data want;

merge sample control (rename=firm=control_firm);

by year industry ;

if not missing (firm) then output;

call missing(firm);

run;

proc print;run;

Regards,

Haikuo

shalmali
Calcite | Level 5

Thank you all for the codes. I really appreciate for helping me out.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2067 views
  • 10 likes
  • 4 in conversation