Dear Members:
I am new to research as well as to SAS. I am having the following problem. I highly appreciate your attempts. [ I have basic ideas about base sas and proc sql].
I have two datasets:
Sample Dataset:
Company SIC ROE TA
AA 200 2% 500
BB 200 5% 600
CC 300 7% 400
DD 400 8.5% 300
EE 400 9.5% 345
.
.
.
.
.
Control dataset:
Company SIC ROE TA
XX 200 3% 550
YY 200 5.4% 680
ZZ 300 7.5% 500
PP 400 6.5% 600
SS 400 5.5% 745
.
.
..
.
.
I need at least two companies from control dataset for every single sample company from sample dataset. Please let me know sample codes. The codes my be in proc sql or base sas format. Thank you.
Untested, will do all matches and you'll need to filter them down.
proc sql;
create table want as
select a.*, b.company_id as control_id, b.roe as control_roe, b.ta as control_ta
from table1 as a
left join table2 as b
on a.sic=b.sic
and (a.roe-b.roe)/a.roe between -0.2 and 0.2
and (a.ta - b.ta)/a.ta between -.02 and 0.2; /*0.2 is 20%*/
quit;
Dear Members:
I am new to research as well as to SAS. I am having the following problem. I highly appreciate your attempts. [ I have basic ideas about base sas and proc sql].
I have two datasets:
Sample Dataset:
Company SIC ROE TA
AA 200 2% 500
BB 200 5% 600
CC 300 7% 400
DD 400 8.5% 300
EE 400 9.5% 345
.
.
.
.
.
Control dataset:
Company SIC ROE TA
XX 200 3% 550
YY 200 5.4% 680
ZZ 300 7.5% 500
PP 400 6.5% 600
SS 400 5.5% 745
.
.
..
.
.
I need at least two companies from control dataset for every single sample company from sample dataset. Please let me know sample codes. The codes my be in proc sql or base sas format. Thank you.
Can you post the output you want see ? And what do you mean by "I need at least two companies from control dataset " ? Is there any rules to pick up these at least two companies ?
What's your criteria for 'matching'?
My matching should be on the basis of SIC, ROE, and TA. TA and ROE maybe within a range, say 25% or 50%.
And can a company be matched multiple times? You really need to specify your criteria more clearly.
If you search for propensity matching you'll find algorithm and code examples online.
Thank you for your reply.
I need at least two companies for every sample company, and I want to do that on the basis of ROE, TA, and SIC.
say for example, for company AA, I need company XX and YY.
Thank you for your inputs.
Untested, will do all matches and you'll need to filter them down.
proc sql;
create table want as
select a.*, b.company_id as control_id, b.roe as control_roe, b.ta as control_ta
from table1 as a
left join table2 as b
on a.sic=b.sic
and (a.roe-b.roe)/a.roe between -0.2 and 0.2
and (a.ta - b.ta)/a.ta between -.02 and 0.2; /*0.2 is 20%*/
quit;
data sample;
input company$ sic roe ta;
datalines;
AA 100 5.5 100
BB 100 5.6 200
CC 200 6.5 300
DD 400 5.8 300
EE 400 8.9 500
;
run;
proc print data=sample;
run;
data control;
input company$ sic roe ta;
datalines;
XX 100 5.4 100
YY 100 5.9 200
DE 100 5.6 200
RR 200 6.4 300
RE 200 7.5 400
RT 300 6.5 450
TT 400 5.9 300
TR 400 9.8 500
;
run;
proc print data=Control;
run;
proc sql;
create table want as
select a.*, b.company as control, b.roe as control_roe, b.ta as control_ta
from sample as a
left join control as b
on a.sic=b.sic
and (a.roe-b.roe)/a.roe between -0.5 and 0.7
and (a.ta - b.ta)/a.ta between -.02 and 0.2; /*0.2 is 20%*/
quit;
proc print data=want;
run;
I have run the above code and found the following:
The SAS System
Obs company sic roe ta control control_roe control_ta
1 AA 100 5.5 100 XX 5.4 100
2 BB 100 5.6 200 YY 5.9 200
3 CC 200 6.5 300 RR 6.4 300
4 DD 400 5.8 300 TT 5.9 300
5 EE 400 8.9 500 TR 9.8 500
But I want, for example, for company BB, match companies will be YY and DE. That codes give me only YY for BB. How can I have both YY and DE for BB.
Thank you for your efforts.
Must not be within the tolerance levels you set. What is the percent diff for your matching variables?
Edit: Yes, it's outside the tolerance levels set. Modify to your requirements.
Should I ignore the tolerance level such as 50% or 20%?
@Khadem wrote:
Should I ignore the tolerance level such as 50% or 20%?
It's your specifications, not mine. I assume there's some rationale to your boundaries.
The company in Q was -5%, the interval used was -2 to 20%.
Thank you. Let me try with some modifications. I will let you know.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.