BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shalmali
Calcite | Level 5


I have two samples of firm- test sample and control sample. I want to find a match for my test sample from the control sample from the same industry. I also want to make sure that if there are two firms in my test sample from same industry, then I need two distinct firms from control sample for matching.

For instance my test sample looks like this:

data sample;

year firm $ industry;

cards;

2000     A     21

2000           B           21

2001     A     21

2002     A     21

2002     C     50

2002     D     60

;

RUN;

The Control sample looks like the following:

    

data control;

year firm $ industry;

cards;

2000    m      21

2000          n               21

2000     o     21

2000     p     21

2001     q     21

2001     r     21

2002     s     21

2002     t     21

2002     u     21

2002      v    50

2002     w     50

2002     x     60

;

RUN;

The final sample (match) should look like:

year  sample  industry     control  

2000     A     21           m

2000           B           21                        n

2001     A     21           q

2002     A     21           s

2002     C     50           v

2002     D     60           x

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Hi Shalmali,

Please check the below code. Hope it give the desired output.

proc sort data=sample;

by year;

run;

proc sort data=control;

by year;

run;

data test_merge;

merge sample (in=A) control(in=B rename=(firm=control));

by year;

if A and B;

if first.year or last.year then output;

run;

Thanks,

Jag

Thanks,
Jag

View solution in original post

6 REPLIES 6
Jagadishkatam
Amethyst | Level 16

Hi Shalmali,

Please check the below code. Hope it give the desired output.

proc sort data=sample;

by year;

run;

proc sort data=control;

by year;

run;

data test_merge;

merge sample (in=A) control(in=B rename=(firm=control));

by year;

if A and B;

if first.year or last.year then output;

run;

Thanks,

Jag

Thanks,
Jag
shalmali
Calcite | Level 5

Hi Jag,

The program is not giving me the desired results. There are duplicate variables. The output from the program looks like this:

Obs    year    firm    industry    control

1     2000     A         21          m

2     2000     B         21          p

3     2001     A         21          q

4     2001     A         21          r

5     2002     A         21          s

6     2002     D         60          x

I want the matched sample to look like this:

The final sample (match) should look like:

year  sample  industry     control  

2000     A     21           m

2000           B           21                        n

2001     A     21           q

2002     A     21           s

2002     C     50           v

2002     D     60           x

Is there anything to resolve this?

Thank you.

Jagadishkatam
Amethyst | Level 16

Hi Shalmali,

Please check the below code. It gives the exact output you are looking for. Thanks to TD for his valuable code. It helped me to correct my code and reproduce the desired output using the merge statement.

data sample;

set sample;

by year industry;

if first.industry then count=0;

count+1;

run;

data control;

set control;

by year industry;

if first.industry then count=0;

count+1;

run;

proc sort data=sample;

by year industry count;

run;

proc sort data=control;

by year industry count;

run;

data _merge (drop=count);

merge sample(in=a) control(in=b rename=(firm=control));

by year industry count;

if a and b;

run;


Thanks,

Jag

Thanks,
Jag
TD
Calcite | Level 5 TD
Calcite | Level 5

My solution:

data sample;
input year firm $ industry;
cards;
2000     A     21
2000     B     21
2001     A     21
2002     A     21
2002     C     50
2002     D     60
;
RUN;

data control;
input year control $ industry;
cards;
2000    m      21
2000    n      21
2000     o     21
2000     p     21
2001     q     21
2001     r     21
2002     s     21
2002     t     21
2002     u     21
2002      v    50
2002     w     50
2002     x     60
;
RUN;

proc sort data=control;
    by year industry control;
run;

data control;
    set control;
    by year industry control;
    retain dup;
    if first.industry then dup = 0;
    dup = dup + 1;
run;

proc sort data=sample;
   by year industry firm;
run;

data sample;
    set sample;
    by year industry firm;
    retain dup;
    if first.industry then dup = 0;
    dup = dup + 1;
run;

data match;
    if _n_ eq 1 then do;
        if 0 then set sample control;
        declare hash hc(dataset:'control');
        hc.defineKey('year', 'industry', 'dup');
        hc.defineData('control');
        hc.defineDone();
    end;

    set sample;
    drop rc dup;
    rc = hc.find();

run;

Haikuo
Onyx | Level 15

Actually, given the data as is, it can be done just by using one simple merge operation:

data sample;

input

year firm $ industry;

cards;

2000 A 21

2000 B 21

2001 A 21

2002 A 21

2002 C 50

2002 D 60

;

RUN;

data control;

input year firm $ industry;

cards;

2000 m 21

2000 n 21

2000 o 21

2000 p 21

2001 q 21

2001 r 21

2002 s 21

2002 t 21

2002 u 21

2002 v 50

2002 w 50

2002 x 60

;

RUN;

data want;

merge sample control (rename=firm=control_firm);

by year industry ;

if not missing (firm) then output;

call missing(firm);

run;

proc print;run;

Regards,

Haikuo

shalmali
Calcite | Level 5

Thank you all for the codes. I really appreciate for helping me out.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1489 views
  • 10 likes
  • 4 in conversation