I have two samples of firm- test sample and control sample. I want to find a match for my test sample from the control sample from the same industry. I also want to make sure that if there are two firms in my test sample from same industry, then I need two distinct firms from control sample for matching.
For instance my test sample looks like this:
data sample;
year firm $ industry;
cards;
2000 A 21
2000 B 21
2001 A 21
2002 A 21
2002 C 50
2002 D 60
;
RUN;
The Control sample looks like the following:
data control;
year firm $ industry;
cards;
2000 m 21
2000 n 21
2000 o 21
2000 p 21
2001 q 21
2001 r 21
2002 s 21
2002 t 21
2002 u 21
2002 v 50
2002 w 50
2002 x 60
;
RUN;
The final sample (match) should look like:
year sample industry control
2000 A 21 m
2000 B 21 n
2001 A 21 q
2002 A 21 s
2002 C 50 v
2002 D 60 x
Thank you in advance.
Hi Shalmali,
Please check the below code. Hope it give the desired output.
proc sort data=sample;
by year;
run;
proc sort data=control;
by year;
run;
data test_merge;
merge sample (in=A) control(in=B rename=(firm=control));
by year;
if A and B;
if first.year or last.year then output;
run;
Thanks,
Jag
Hi Shalmali,
Please check the below code. Hope it give the desired output.
proc sort data=sample;
by year;
run;
proc sort data=control;
by year;
run;
data test_merge;
merge sample (in=A) control(in=B rename=(firm=control));
by year;
if A and B;
if first.year or last.year then output;
run;
Thanks,
Jag
Hi Jag,
The program is not giving me the desired results. There are duplicate variables. The output from the program looks like this:
Obs year firm industry control
1 2000 A 21 m
2 2000 B 21 p
3 2001 A 21 q
4 2001 A 21 r
5 2002 A 21 s
6 2002 D 60 x
I want the matched sample to look like this:
The final sample (match) should look like:
year sample industry control
2000 A 21 m
2000 B 21 n
2001 A 21 q
2002 A 21 s
2002 C 50 v
2002 D 60 x
Is there anything to resolve this?
Thank you.
Hi Shalmali,
Please check the below code. It gives the exact output you are looking for. Thanks to TD for his valuable code. It helped me to correct my code and reproduce the desired output using the merge statement.
data sample;
set sample;
by year industry;
if first.industry then count=0;
count+1;
run;
data control;
set control;
by year industry;
if first.industry then count=0;
count+1;
run;
proc sort data=sample;
by year industry count;
run;
proc sort data=control;
by year industry count;
run;
data _merge (drop=count);
merge sample(in=a) control(in=b rename=(firm=control));
by year industry count;
if a and b;
run;
Thanks,
Jag
My solution:
data sample;
input year firm $ industry;
cards;
2000 A 21
2000 B 21
2001 A 21
2002 A 21
2002 C 50
2002 D 60
;
RUN;
data control;
input year control $ industry;
cards;
2000 m 21
2000 n 21
2000 o 21
2000 p 21
2001 q 21
2001 r 21
2002 s 21
2002 t 21
2002 u 21
2002 v 50
2002 w 50
2002 x 60
;
RUN;
proc sort data=control;
by year industry control;
run;
data control;
set control;
by year industry control;
retain dup;
if first.industry then dup = 0;
dup = dup + 1;
run;
proc sort data=sample;
by year industry firm;
run;
data sample;
set sample;
by year industry firm;
retain dup;
if first.industry then dup = 0;
dup = dup + 1;
run;
data match;
if _n_ eq 1 then do;
if 0 then set sample control;
declare hash hc(dataset:'control');
hc.defineKey('year', 'industry', 'dup');
hc.defineData('control');
hc.defineDone();
end;
set sample;
drop rc dup;
rc = hc.find();
run;
Actually, given the data as is, it can be done just by using one simple merge operation:
data sample;
input
year firm $ industry;
cards;
2000 A 21
2000 B 21
2001 A 21
2002 A 21
2002 C 50
2002 D 60
;
RUN;
data control;
input year firm $ industry;
cards;
2000 m 21
2000 n 21
2000 o 21
2000 p 21
2001 q 21
2001 r 21
2002 s 21
2002 t 21
2002 u 21
2002 v 50
2002 w 50
2002 x 60
;
RUN;
data want;
merge sample control (rename=firm=control_firm);
by year industry ;
if not missing (firm) then output;
call missing(firm);
run;
proc print;run;
Regards,
Haikuo
Thank you all for the codes. I really appreciate for helping me out.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.