Hi All:
I am very new and layman in SAS. I am learning SAS by doing. I am facing problem with matching.
I have total 68 firms with CSR reports. I want to match those firms with the same SIC and nearest total assets.
Sample firms:
Year ticker SIC total_assets
2013 CA 7372 12016
2013 Flex 3672 12500.15
2013 TIF 5944 4752.351
2013 DELL 3571 92358
2013 ALK 5512 5838
2013 CRM 7372 9152.93
2013 LDOS 7373 4162
2013 HND1 3350 9114
....
.
..
.
.
Control firms:
Year ticker sic total_assets
2013 m 5080 2136.9
2013 n 3670 2601.995
2013 v 3812 23.324
2013 p 4513 36.055
2013 q 3312 1877.6
2013 r 7510 5306.601
2013 t 2771 1583.463
2013 w 3844 281.449
2013 tt 7372 113.07
2013 y 7389 195.856
2013 AXR 7389 195.856
..
.
.
.
.
I want to have the output as follows:
Year ticker SIC total_assets match_firm
2013 CA 7372 12016 n
2013 Flex 3672 12500.15 m
2013 TIF 5944 4752.351 p
2013 DELL 3571 92358 t
2013 ALK 5512 5838 q
2013 CRM 7372 9152.93 r
2013 LDOS 7373 4162 tt
[This is my imaginary mattching].
I highly appreciate your help.
Thank you in advance.
One way to do it after you have the data is using a dedup step using a sort:
Proc sort data=your dataset name nodupkeys;
By sic total_assets;
Run;
The other way would be by eliminating the duplicates within the process sql step like a "select distinct".
Could you please elaborate on your match criteria ?
"I want to match those firms with the same SIC and nearest total assets. "
Thank you for your reply.
I want to match firm on the basis of SIC and total_assets.
I have 68 sample firms that I want to mathc on the basis of SIC and total_assets.
Sample firms=68
control sample=12,000
I want to have one for one--> for one sample firm, I want one control firm. If you like, I can send you the sample and control firms. Thank you for your concerns.
I will be waiting to hear from you.
Could you also please explain how you arrive at the output for a couple of rows specially the values in the right most column "m" and "n"
Year ticker SIC total_assets match_firm
2013 CA 7372 12016 n
2013 Flex 3672 12500.15 m
This is my imaginary matching. The output will look like this. I have total 68 sample firms and the control group has 12,000 firms. I want which firms match with the sample firm.
Thank you for your concerns.
Okay. So, I assume you had a matching key with another row that had a "m" value that you have shown in the sample result. Makes sense.
1) Sort Sample & Control Datasets using the SIC & the total_assets variables. I hope you are able to code the DATA & SORT steps.
2) Merge using the same keys.
If you like to have all rows from Control, then code "IF A" or "IF B" accordingly.
That should take care of it... Here is a shell.... for your reference
proc sort data=sample out=sorted_sample;
by sic total_assets;
run;
proc sort data=control(keep=sic total_assets match_firm) out=sorted_control;
by sic total_assets;
run;
data need;
merge sorted_sample (in=a)
sorted_control (in=b);
by sic total_assets;
if a;
run;
Hope this helps.... Good Luck...!!!
Thank you for your quick reply. Let me try as per your instruction. I will let you know.
Can be done with:
proc sql;
create table want as
select s.*, c.ticker as match_firm, c.total_assets as match_assets
from sampleFirms as s inner join controlFirms as c
on s.sic=c.sic
group by s.ticker
having abs(s.total_Assets-c.total_Assets) = min(abs(s.total_Assets-c.total_Assets));
quit;
Note: You have only one matching SIC in your data example.
Thank you PG.
Hi kannand and PGStats:
I have run the codes and I got just the sample firms sorted. Maybe I am missing something. Would you please write detailed code with my sample firms? I highly appreciate your helps. Thank you.
Here is the same code, with your sample data
data sampleFirms;
input Year ticker :$8. SIC total_assets;
datalines;
2013 CA 7372 12016
2013 Flex 3672 12500.15
2013 TIF 5944 4752.351
2013 DELL 3571 92358
2013 ALK 5512 5838
2013 CRM 7372 9152.93
2013 LDOS 7373 4162
2013 HND1 3350 9114
;
data controlFirms;
input Year ticker :$8. SIC total_assets;
datalines;
2013 m 5080 2136.9
2013 n 3670 2601.995
2013 v 3812 23.324
2013 p 4513 36.055
2013 q 3312 1877.6
2013 r 7510 5306.601
2013 t 2771 1583.463
2013 w 3844 281.449
2013 tt 7372 113.07
2013 y 7389 195.856
2013 AXR 7389 195.856
;
proc sql;
create table want as
select s.*, c.ticker as match_firm, c.total_assets as match_assets
from sampleFirms as s inner join controlFirms as c
on s.sic=c.sic
group by s.ticker
having abs(s.total_Assets-c.total_Assets) = min(abs(s.total_Assets-c.total_Assets));
select * from want;
quit;
The code that I presented is a direct match using SIC & total_assets while PGs code is using SIC match with closest total_assets between the sample and control.
If you are looking for matching SICs with unequal total_assets my code will not work. You neeed to go with PG's.
Can you present a sample of your code and output for troubleshooting with sample data that was used ?
Thank you PG. Your code works pefect, but there is a little bit problem in the output. There are duplicates in the output. I am pasting the sample output for your review.
SAS Output
2013 | 0521B | 8711 | 3056.40 | ARCAY | 2315.48 |
2013 | AA | 3350 | 35742.00 | NHYDY | 18997.81 |
2013 | ACM | 8711 | 5665.62 | AMFW | 3951.24 |
2013 | ADBE | 7372 | 10380.30 | 0139A | 9779.00 |
2013 | AHS | 7361 | 592.23 | HSII | 553.22 |
2013 | AVB | 6798 | 15328.14 | SLG | 14959.00 |
2013 | AVB | 6798 | 15328.14 | SLG | 14959.00 |
2013 | AVB | 6798 | 15328.14 | SLG | 14959.00 |
2013 | AVB | 6798 | 15328.14 | SLG | 14959.00 |
2013 | AXTA | 2851 | 6737.10 | SHW | 6382.51 |
2013 | BIIB | 2836 | 11863.34 | GRFS | 8048.36 |
2013 | CA | 7372 | 12016.00 | CA | 11811.00 |
2013 | CCC | 2810 | 590.08 | ATMI | 596.22 |
2013 | CCL | 4400 | 40104.00 | RCL | 20072.95 |
2013 | CI | 6324 | 54336.00 | AET | 49871.80 |
2013 | CI | 6324 | 54336.00 | AET | 49871.80 |
2013 | CI | 6324 | 54336.00 | AET | 49871.80 |
2013 | CI | 6324 | 54336.00 | AET | 49871.80 |
Would you please tell me how can I get rid of those duplicates from the ouput? Thank you.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.