BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SteveLee
Calcite | Level 5

I have 2 datasets test firms and control firms. I need to match these two datasets based on same fiscal year and same sic code and the minimum difference in their size

Here is my SAS code:

PROC SQL;

create table match  as

select a.* , b.* , abs(a.size - b.size) as diff

from control a, test b

where a. fyear = b. fyear and a. sic = b. sic

group by b.fyear ,b.sic, b.cik

having diff=min(diff);

PROC print data=match;

run;

The problem is that my SAS code works like matching with replacement. I need matching without replacement. So once SAS program finds a matched firm, it cannot use the same firm to be matched another test firm. (in the same fiscal year & same sic code) 

Any comments or help would be appreciated. Thank you

Best,

Steve

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

You can't do it via SQL. use Hash Table .


Code: Program


data test;
set MERGEDATA_5(keep=companies fyr sic1 sizes);
run;
data control;
set MERGEDATA_6(keep=companies fyr sic1 sizes
   rename=(companies=_companies fyr=fyr sic1=sic1 sizes =_sizes));
k+1;
run;


proc sort data=test; by fyr sic1 sizes;run;
data want;
if _n_ eq 1 then do;
  if 0 then set control;
  declare hash h(dataset:'control',multidata:'y');
  h.definekey('fyr','sic1');
  h.definedata('_companies','_sizes','k');
  h.definedone();
end;
call missing(of _all_);
set test;
n=0;
min=9999999;
rc=h.find();
do while(rc=0);
abs=abs(sizes-_sizes);
if abs lt min then do;n=k;min=abs;end;
rc=h.find_next();
end;
rc=h.find();
do while(rc=0);
if n=k then do;h.removedup();leave;end;
rc=h.find_next();
end;
drop rc n min abs k;
run;








Xia Keshan

Message was edited by: xia keshan Fix a problem

View solution in original post

4 REPLIES 4
Ksharp
Super User

You missed a keyword CALCULATED .

PROC SQL;

create table match  as

select a.* , b.* , abs(a.size - b.size) as diff

from control a, test b

where a. fyear = b. fyear and a. sic = b. sic

group by b.fyear ,b.sic, b.cik

having calculated diff=min(calculated diff);

run;

and list all the variable you need in SELECT, not like ; a.*,b.* . You will got ERROR info if a and b have the same variable  name.

Xia Keshan

SteveLee
Calcite | Level 5

I really appreciate your answer. Unfortunately, your suggestion did not work for all the cases. I think it's still working as 'matching with replacement' I mentioned above. Could you give me another way to solve this problem?

Best,

Steve

SteveLee
Calcite | Level 5


                                                          Matching with replacement   /  Matching without replacement

Test firm                                             Matching firm

   sic year         

A  22  2009    Size=$42billion             C 22 2009 Size=41billion  /  C 22 2009 Size=41billion

B  22  2009    Size=$40billion             C 22 2009 Size=41billion  /  D 22 2009 Size=41.5 billion

To create a matched sample, I need to find a matched firm for 'A' and another for 'B'. If I'm matching on specific year (2009), I need to find with closest to 50billion for 'A' from the same sic code in 2009. Again the same for 'B'

I give an specific example of 'Matching with replacement' and 'Matching without replacement'. My desire output is 'Matching without replacement'

This means that once SAS program finds a matched firm, it cannot use the same firm to be matched another test firm. (in the same fiscal year & same sic code).

PROC SQL;
create table matching1  as
select a.*, b.*, abs(a.size - b.sizes) as diff
from mergedata_6 a, mergedata_5 b
where a. fyear = b. fyr and a. sic = b. sic1
group by b.fyr ,b.sic1, b.ciks
having calculated diff=min(calculated diff);
PROC print data=matching1;
run;

However, my SAS code just works as 'Matching with replacement'. Please give some help for solving this problem.

Thank you

Best,
Steve

Ksharp
Super User

You can't do it via SQL. use Hash Table .


Code: Program


data test;
set MERGEDATA_5(keep=companies fyr sic1 sizes);
run;
data control;
set MERGEDATA_6(keep=companies fyr sic1 sizes
   rename=(companies=_companies fyr=fyr sic1=sic1 sizes =_sizes));
k+1;
run;


proc sort data=test; by fyr sic1 sizes;run;
data want;
if _n_ eq 1 then do;
  if 0 then set control;
  declare hash h(dataset:'control',multidata:'y');
  h.definekey('fyr','sic1');
  h.definedata('_companies','_sizes','k');
  h.definedone();
end;
call missing(of _all_);
set test;
n=0;
min=9999999;
rc=h.find();
do while(rc=0);
abs=abs(sizes-_sizes);
if abs lt min then do;n=k;min=abs;end;
rc=h.find_next();
end;
rc=h.find();
do while(rc=0);
if n=k then do;h.removedup();leave;end;
rc=h.find_next();
end;
drop rc n min abs k;
run;








Xia Keshan

Message was edited by: xia keshan Fix a problem

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 4420 views
  • 0 likes
  • 2 in conversation