BookmarkSubscribeRSS Feed
mei
Calcite | Level 5 mei
Calcite | Level 5

I have 2 datasets, one is small sample of 741 firms and I need to find a match for each firm from a dataset of 8330 firms using criteria of same financial year (fyear) and same industry (ffind) and take the firm with approximately the same size.

I  have used the following sas codes to select matched pair. Now i found out that a same firm can be selected several times as a matched pair for the original dataset of 741 firms. How do I avoid the same firm to be selected again?

proc sort data=huang_741a;

by gvkey fyear;

run;

proc sort data=huang_bal_04a;

by gvkey fyear;

run;

proc sql;

create table control_741 as

select O.*, A.gvkey as Anum, A.fyear as Afy, abs(O.size-A.size) as sizeDiff

from huang_bal_04a as O inner join huang_741a as A

     on O.fYear=A.fyear and O.ffind=A.ffind

where O.gvkey not in (select gvkey from huang_741a)

    order by gvkey;

    *27001 rows**;

proc sort data=control_741;

by anum afy;

run;

proc means data=control_741 noprint;

by Anum afy;

output out=selected_741 idgroup(min(sizeDiff) out[1] (fyear gvkey)=);

run;

**741**;

proc sql;

create table selected_741A as

select a.*, b.*

from selected_741 a left join  huang_bal_04a b

on a.gvkey=b.gvkey and a.fyear=b.fYear

order by a.gvkey, a.fyear;

run;

**597**;

data combined_741;

set huang_741a selected_741A;

run;

**741+741=1282**;

5 REPLIES 5
LearnByMistk
Obsidian | Level 7

can u give some data sample and try to explain with help of that what you need.

mei
Calcite | Level 5 mei
Calcite | Level 5

I have attached two files: huang_741 and huang_bal_04a,

I have now created 5 sets of matched pair for each firms using the codes as follows:

proc means data=control_total741 noprint;

by Anum afy;

output out=selected_total741_5set idgroup(min(sizeDiff) out[5] (fyear gvkey)=);

run;

**741**;

I have attached the file: selected_741_5sets.

I was thinking maybe it is good to look at this file. If 1 firms are selected three times, gvkey_1 will repeat for three times. then i should choose gvkey_2 and gvkey_3 for the second and third duplicates. Instead of going through manually (there are 71 duplicates), do you think is it a good way to write some codes to tell the program to choose correctly?

Thanks for your time.

drsurf
Calcite | Level 5

It would be helpful if you also attached the huang_bal_04a dataset.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Well, in answer to your question "How do I avoid the same firm to be selected again?", that's a fairly simple response, look at your data, isolate why more than one record matches based on your join criteria, then evaluate a method to reduce that, maybe using distinct, or first date or something else.  So look at each step and identify why you get multiple records would be the first stage.  Providing example data, and narrowing down which procedure is causing the problem would be beneficial.

Ksharp
Super User

It would be better to post a little sample data and the output you need . Explain it as few words as you could.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 989 views
  • 0 likes
  • 5 in conversation