## merging data sets

Occasional Contributor
Posts: 15

# merging data sets

I would appreciate any help with the following problem

I have two data sets:

Data set (1) contains transactions by different a sample of firms I am interested in

e.g.

Firmid             data               transaction price                   amount

42324df          1996/07/15           10.9                                   6000

42324df          1999/06/10           13.5                                   55000

42324df          2003/10/15           20.7                                   33000

23492fg          1997/06/10            6.9                                   10000

23492fg          1999/01/20            16.9                                  55000

23492fg          2000/07/10            2.50                                  210000

23492fg          2003/06/10            4.50                                 110000

Data set (2) contains the historical pricing data for the entire universe of firms on a daily basis for a long period of time.

e.g.

Firmid             data               transaction price

42324df          1991/01/01          5.00

...                    ....                    ...

...                    ...                    ...                         the ... represent further daily pricing data for the respective stocks

...                    ...                    ....

...                    ....                    ...

...                    ...                    ...

23492fg          1997/06/10           13.50

...                    ....                    ...

...                    ...                    ...

...                    ...                    ....

...                    ....                    ...

...                    ...                    ...

12308hda          1995/02/10     150.00

...                    ....                    ...

...                    ...                    ...

...                    ...                    ....

...                    ....                    ...

...                    ...                    ...

I am trying to do the following:

1. Only include firms in the merged data set if there is at least 200 days of pricing data (in dataset B) prior to each transaction in dataset A
2. Once I have merged the two datasets my goal is to run regressions with return as the dependent variable and different firm characteristics which are contained in dataset (a) as the independent variables.

Any help is greatly appreciated

Super User
Posts: 5,890

## Re: merging data sets

For 1.

I would first set a consecutive no pricing days on each price date record in B. This could be fairly simple by sorting it by firmid and date, and then just augment a counter. Any other business rules need to taken care, such as if there is period with no pricing etc. If your data is large you could optimize by just output the records where your "pricing day counter" is 200 or more.

As step two, join A with the result above (and filter out where pricing days less than 200, if not done in the previous step).

Data never sleeps
Regular Contributor
Posts: 195

## Re: merging data sets

Hi,

Here is my try...Hope it meets the requirement...

/*In Two Step*/

proc sql;

create table reg_data(drop = temp) as

select a.*,year(a.data) as temp,count(calculated temp) as tot_days

from pop as a,pop as b

where a.data= b.data

group by a.year

having tot_days GE 200

order by a.data;

quit;

proc sql;

create table reg_data as

select a.*

from reg_data as a right join sample as b

on a.firmid = b.firmid;

quit;

/*In one step*/

proc sql;

create table reg_data(drop = temp) as

select c.*,year(a.data) as temp,count(calculated temp) as tot_days

from pop as a        left join

pop as b

on a.data = b.data right join

sample as c

on a.firmid = c.firmid

group by a.year

having tot_days GE 200

order by a.data;

quit;

-Urvish

Discussion stats
• 2 replies
• 218 views
• 0 likes
• 3 in conversation