SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Matching with control sample.

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

Matching with control sample.

Hi All:

I am very new and layman in SAS. I am learning SAS by doing. I am facing problem with matching. 

I have total 68 firms with CSR reports. I want to match those firms with the same SIC and nearest total assets. 

Sample firms:

Year     ticker     SIC     total_assets

2013     CA       7372     12016
2013     Flex      3672    12500.15
2013     TIF       5944     4752.351
2013     DELL   3571     92358
2013    ALK       5512    5838
2013    CRM     7372     9152.93
2013    LDOS    7373     4162
2013    HND1   3350     9114

....

.

..

.

.

Control firms:

Year      ticker     sic    total_assets

2013      m       5080      2136.9
2013     n        3670      2601.995
2013     v         3812      23.324
2013     p         4513       36.055
2013     q         3312       1877.6
2013     r          7510        5306.601
2013    t            2771      1583.463
2013     w         3844       281.449
2013     tt         7372      113.07
2013     y        7389       195.856
2013     AXR      7389        195.856

..

.

.

.

.

I want to have the output as follows:

Year     ticker     SIC     total_assets     match_firm

2013     CA       7372     12016                  n
2013     Flex      3672    12500.15            m
2013     TIF       5944     4752.351            p
2013     DELL   3571     92358                 t
2013    ALK       5512    5838                    q
2013    CRM     7372     9152.93              r
2013    LDOS    7373     4162                   tt

[This is my imaginary mattching].

 

I highly appreciate your help. 

 

Thank you in advance.

 

 


Accepted Solutions
Solution
‎11-22-2015 02:07 AM
Regular Contributor
Posts: 161

Re: Matching with control sample.

One way to do it after you have the data is using a dedup step using a sort:

 

Proc sort data=your dataset name nodupkeys;

   By sic total_assets;

Run;

 

The other way would be by eliminating the duplicates within the process sql step like a "select distinct".

Kannan Deivasigamani

View solution in original post


All Replies
Regular Contributor
Posts: 161

Re: Matching with control sample.

Could you please elaborate on your match criteria ? 

 

"I want to match those firms with the same SIC and nearest total assets. "

Kannan Deivasigamani
Contributor
Posts: 20

Re: Matching with control sample.

Thank you for your reply. 

I want to match firm on the basis of SIC and total_assets. 

Contributor
Posts: 20

Re: Matching with control sample.

I have 68 sample firms that I want to mathc on the basis of SIC and total_assets. 

Sample firms=68

control sample=12,000

 

I want to have one for one--> for one sample firm, I want one control firm. If you like, I can send you the sample and control firms. Thank you for your concerns. 

I will be waiting to hear from you.

Regular Contributor
Posts: 161

Re: Matching with control sample.

Could you also please explain how you arrive at the output for a couple of rows specially the values in the right most column "m" and "n"

 

Year     ticker     SIC     total_assets     match_firm

2013     CA       7372     12016                  n
2013     Flex      3672    12500.15            m

Kannan Deivasigamani
Contributor
Posts: 20

Re: Matching with control sample.

This is my imaginary matching. The output will look like this. I have total 68 sample firms and the control group has 12,000 firms. I want which firms match with the sample firm. 

Thank you for your concerns. 

Regular Contributor
Posts: 161

Re: Matching with control sample.

[ Edited ]

Okay. So, I assume you had a matching key with another row that had a "m"  value that you have shown in the sample result. Makes sense. 

 

1) Sort Sample & Control Datasets using the SIC & the total_assets variables. I hope you are able to code the DATA & SORT steps.

2) Merge using the same keys. 

 

If you like to have all rows from Control, then code "IF A" or "IF B" accordingly. 

 

That should take care of it...   Here is a shell.... for your reference

proc sort data=sample out=sorted_sample;
  by sic total_assets;
run;

proc sort data=control(keep=sic total_assets match_firm) out=sorted_control;
  by sic total_assets;
run;

data need;
  merge sorted_sample (in=a)
             sorted_control  (in=b);
        by sic total_assets;
  if a;
run;

Hope this helps.... Good Luck...!!!

Kannan Deivasigamani
Contributor
Posts: 20

Re: Matching with control sample.

Thank you for your quick reply. Let me try as per your instruction. I will let you know. 

 

Contributor
Posts: 20

Re: Matching with control sample.

After running your codes, I just get the sorting of sample firms. Am I missing something?
Respected Advisor
Posts: 4,934

Re: Matching with control sample.

Can be done with:

 


proc sql;
create table want as
select s.*, c.ticker as match_firm, c.total_assets as match_assets
from sampleFirms as s inner join controlFirms as c
on s.sic=c.sic 
group by s.ticker
having abs(s.total_Assets-c.total_Assets) = min(abs(s.total_Assets-c.total_Assets));
quit;

Note: You have only one matching SIC in your data example.

PG
Contributor
Posts: 20

Re: Matching with control sample.

Thank you PG.

Contributor
Posts: 20

Re: Matching with control sample.

Hi 

I have run the codes and I got just the sample firms sorted. Maybe I am missing something. Would you please write detailed code with my sample firms? I highly appreciate your helps. Thank you.

 

Respected Advisor
Posts: 4,934

Re: Matching with control sample.

Here is the same code, with your sample data

 

data sampleFirms;
input Year     ticker :$8.    SIC     total_assets;
datalines;
2013     CA       7372     12016
2013     Flex      3672    12500.15
2013     TIF       5944     4752.351
2013     DELL   3571     92358
2013    ALK       5512    5838
2013    CRM     7372     9152.93
2013    LDOS    7373     4162
2013    HND1   3350     9114
;

data controlFirms;
input Year     ticker :$8.    SIC     total_assets;
datalines;
2013      m       5080      2136.9
2013     n        3670      2601.995
2013     v         3812      23.324
2013     p         4513       36.055
2013     q         3312       1877.6
2013     r          7510        5306.601
2013    t            2771      1583.463
2013     w         3844       281.449
2013     tt         7372      113.07
2013     y        7389       195.856
2013     AXR      7389        195.856
;

proc sql;
create table want as
select s.*, c.ticker as match_firm, c.total_assets as match_assets
from sampleFirms as s inner join controlFirms as c
on s.sic=c.sic 
group by s.ticker
having abs(s.total_Assets-c.total_Assets) = min(abs(s.total_Assets-c.total_Assets));
select * from want;
quit;

PG
Regular Contributor
Posts: 161

Re: Matching with control sample.

The code that I presented is a direct match using SIC & total_assets while PGs code is using  SIC match with closest total_assets between the sample and control. 

 

If you are looking for matching SICs with unequal total_assets my code will not work. You neeed to go with PG's.

 

Can you present a sample of your code and output for troubleshooting with sample data that was used ?

Kannan Deivasigamani
Contributor
Posts: 20

Re: Matching with control sample.

Thank you PG. Your code works pefect, but there is a little bit problem in the output. There are duplicates in the output. I am pasting the sample output for your review. 

SAS Output

Obs Year ticker SIC total_assets match_firm match_assets 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
20130521B87113056.40ARCAY2315.48
2013AA335035742.00NHYDY18997.81
2013ACM87115665.62AMFW3951.24
2013ADBE737210380.300139A9779.00
2013AHS7361592.23HSII553.22
2013AVB679815328.14SLG14959.00
2013AVB679815328.14SLG14959.00
2013AVB679815328.14SLG14959.00
2013AVB679815328.14SLG14959.00
2013AXTA28516737.10SHW6382.51
2013BIIB283611863.34GRFS8048.36
2013CA737212016.00CA11811.00
2013CCC2810590.08ATMI596.22
2013CCL440040104.00RCL20072.95
2013CI632454336.00AET49871.80
2013CI632454336.00AET49871.80
2013CI632454336.00AET49871.80
2013CI632454336.00AET49871.80

 Would you please tell me how can I get rid of those duplicates from the ouput? Thank you.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 1285 views
  • 3 likes
  • 4 in conversation