BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nzrdufiu
Obsidian | Level 7

Hi All:

I am very new and layman in SAS. I am learning SAS by doing. I am facing problem with matching. 

I have total 68 firms with CSR reports. I want to match those firms with the same SIC and nearest total assets. 

Sample firms:

Year     ticker     SIC     total_assets

2013     CA       7372     12016
2013     Flex      3672    12500.15
2013     TIF       5944     4752.351
2013     DELL   3571     92358
2013    ALK       5512    5838
2013    CRM     7372     9152.93
2013    LDOS    7373     4162
2013    HND1   3350     9114

....

.

..

.

.

Control firms:

Year      ticker     sic    total_assets

2013      m       5080      2136.9
2013     n        3670      2601.995
2013     v         3812      23.324
2013     p         4513       36.055
2013     q         3312       1877.6
2013     r          7510        5306.601
2013    t            2771      1583.463
2013     w         3844       281.449
2013     tt         7372      113.07
2013     y        7389       195.856
2013     AXR      7389        195.856

..

.

.

.

.

I want to have the output as follows:

Year     ticker     SIC     total_assets     match_firm

2013     CA       7372     12016                  n
2013     Flex      3672    12500.15            m
2013     TIF       5944     4752.351            p
2013     DELL   3571     92358                 t
2013    ALK       5512    5838                    q
2013    CRM     7372     9152.93              r
2013    LDOS    7373     4162                   tt

[This is my imaginary mattching].

 

I highly appreciate your help. 

 

Thank you in advance.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
kannand
Lapis Lazuli | Level 10

One way to do it after you have the data is using a dedup step using a sort:

 

Proc sort data=your dataset name nodupkeys;

   By sic total_assets;

Run;

 

The other way would be by eliminating the duplicates within the process sql step like a "select distinct".

Kannan Deivasigamani

View solution in original post

17 REPLIES 17
kannand
Lapis Lazuli | Level 10

Could you please elaborate on your match criteria ? 

 

"I want to match those firms with the same SIC and nearest total assets. "

Kannan Deivasigamani
nzrdufiu
Obsidian | Level 7

Thank you for your reply. 

I want to match firm on the basis of SIC and total_assets. 

nzrdufiu
Obsidian | Level 7

I have 68 sample firms that I want to mathc on the basis of SIC and total_assets. 

Sample firms=68

control sample=12,000

 

I want to have one for one--> for one sample firm, I want one control firm. If you like, I can send you the sample and control firms. Thank you for your concerns. 

I will be waiting to hear from you.

kannand
Lapis Lazuli | Level 10

Could you also please explain how you arrive at the output for a couple of rows specially the values in the right most column "m" and "n"

 

Year     ticker     SIC     total_assets     match_firm

2013     CA       7372     12016                  n
2013     Flex      3672    12500.15            m

Kannan Deivasigamani
nzrdufiu
Obsidian | Level 7

This is my imaginary matching. The output will look like this. I have total 68 sample firms and the control group has 12,000 firms. I want which firms match with the sample firm. 

Thank you for your concerns. 

kannand
Lapis Lazuli | Level 10

Okay. So, I assume you had a matching key with another row that had a "m"  value that you have shown in the sample result. Makes sense. 

 

1) Sort Sample & Control Datasets using the SIC & the total_assets variables. I hope you are able to code the DATA & SORT steps.

2) Merge using the same keys. 

 

If you like to have all rows from Control, then code "IF A" or "IF B" accordingly. 

 

That should take care of it...   Here is a shell.... for your reference

proc sort data=sample out=sorted_sample;
  by sic total_assets;
run;

proc sort data=control(keep=sic total_assets match_firm) out=sorted_control;
  by sic total_assets;
run;

data need;
  merge sorted_sample (in=a)
             sorted_control  (in=b);
        by sic total_assets;
  if a;
run;

Hope this helps.... Good Luck...!!!

Kannan Deivasigamani
nzrdufiu
Obsidian | Level 7

Thank you for your quick reply. Let me try as per your instruction. I will let you know. 

 

nzrdufiu
Obsidian | Level 7
After running your codes, I just get the sorting of sample firms. Am I missing something?
PGStats
Opal | Level 21

Can be done with:

 


proc sql;
create table want as
select s.*, c.ticker as match_firm, c.total_assets as match_assets
from sampleFirms as s inner join controlFirms as c
on s.sic=c.sic 
group by s.ticker
having abs(s.total_Assets-c.total_Assets) = min(abs(s.total_Assets-c.total_Assets));
quit;

Note: You have only one matching SIC in your data example.

PG
nzrdufiu
Obsidian | Level 7

Thank you PG.

nzrdufiu
Obsidian | Level 7

Hi 

I have run the codes and I got just the sample firms sorted. Maybe I am missing something. Would you please write detailed code with my sample firms? I highly appreciate your helps. Thank you.

 

PGStats
Opal | Level 21

Here is the same code, with your sample data

 

data sampleFirms;
input Year     ticker :$8.    SIC     total_assets;
datalines;
2013     CA       7372     12016
2013     Flex      3672    12500.15
2013     TIF       5944     4752.351
2013     DELL   3571     92358
2013    ALK       5512    5838
2013    CRM     7372     9152.93
2013    LDOS    7373     4162
2013    HND1   3350     9114
;

data controlFirms;
input Year     ticker :$8.    SIC     total_assets;
datalines;
2013      m       5080      2136.9
2013     n        3670      2601.995
2013     v         3812      23.324
2013     p         4513       36.055
2013     q         3312       1877.6
2013     r          7510        5306.601
2013    t            2771      1583.463
2013     w         3844       281.449
2013     tt         7372      113.07
2013     y        7389       195.856
2013     AXR      7389        195.856
;

proc sql;
create table want as
select s.*, c.ticker as match_firm, c.total_assets as match_assets
from sampleFirms as s inner join controlFirms as c
on s.sic=c.sic 
group by s.ticker
having abs(s.total_Assets-c.total_Assets) = min(abs(s.total_Assets-c.total_Assets));
select * from want;
quit;

PG
kannand
Lapis Lazuli | Level 10

The code that I presented is a direct match using SIC & total_assets while PGs code is using  SIC match with closest total_assets between the sample and control. 

 

If you are looking for matching SICs with unequal total_assets my code will not work. You neeed to go with PG's.

 

Can you present a sample of your code and output for troubleshooting with sample data that was used ?

Kannan Deivasigamani
nzrdufiu
Obsidian | Level 7

Thank you PG. Your code works pefect, but there is a little bit problem in the output. There are duplicates in the output. I am pasting the sample output for your review. 

SAS Output

Obs Year ticker SIC total_assets match_firm match_assets 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
20130521B87113056.40ARCAY2315.48
2013AA335035742.00NHYDY18997.81
2013ACM87115665.62AMFW3951.24
2013ADBE737210380.300139A9779.00
2013AHS7361592.23HSII553.22
2013AVB679815328.14SLG14959.00
2013AVB679815328.14SLG14959.00
2013AVB679815328.14SLG14959.00
2013AVB679815328.14SLG14959.00
2013AXTA28516737.10SHW6382.51
2013BIIB283611863.34GRFS8048.36
2013CA737212016.00CA11811.00
2013CCC2810590.08ATMI596.22
2013CCL440040104.00RCL20072.95
2013CI632454336.00AET49871.80
2013CI632454336.00AET49871.80
2013CI632454336.00AET49871.80
2013CI632454336.00AET49871.80

 Would you please tell me how can I get rid of those duplicates from the ouput? Thank you.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 3910 views
  • 3 likes
  • 4 in conversation