BookmarkSubscribeRSS Feed
wschnell1
Calcite | Level 5

I would appreciate any help with the following problem

I have two data sets:

Data set (1) contains transactions by different a sample of firms I am interested in

e.g.

Firmid             data               transaction price                   amount

42324df          1996/07/15           10.9                                   6000

42324df          1999/06/10           13.5                                   55000

42324df          2003/10/15           20.7                                   33000

23492fg          1997/06/10            6.9                                   10000

23492fg          1999/01/20            16.9                                  55000

23492fg          2000/07/10            2.50                                  210000

23492fg          2003/06/10            4.50                                 110000

Data set (2) contains the historical pricing data for the entire universe of firms on a daily basis for a long period of time.

e.g.

Firmid             data               transaction price

42324df          1991/01/01          5.00

...                    ....                    ...

...                    ...                    ...                         the ... represent further daily pricing data for the respective stocks

...                    ...                    ....

...                    ....                    ...

...                    ...                    ...

23492fg          1997/06/10           13.50                                    

...                    ....                    ...

...                    ...                    ...

...                    ...                    ....

...                    ....                    ...

...                    ...                    ...

12308hda          1995/02/10     150.00

...                    ....                    ...

...                    ...                    ...

...                    ...                    ....

...                    ....                    ...

...                    ...                    ...

I am trying to do the following:

  1. Only include firms in the merged data set if there is at least 200 days of pricing data (in dataset B) prior to each transaction in dataset A
  2. Once I have merged the two datasets my goal is to run regressions with return as the dependent variable and different firm characteristics which are contained in dataset (a) as the independent variables.

Any help is greatly appreciated

2 REPLIES 2
LinusH
Tourmaline | Level 20

For 1.

I would first set a consecutive no pricing days on each price date record in B. This could be fairly simple by sorting it by firmid and date, and then just augment a counter. Any other business rules need to taken care, such as if there is period with no pricing etc. If your data is large you could optimize by just output the records where your "pricing day counter" is 200 or more.

As step two, join A with the result above (and filter out where pricing days less than 200, if not done in the previous step).

Data never sleeps
UrvishShah
Fluorite | Level 6

Hi,

Here is my try...Hope it meets the requirement...

/*In Two Step*/

proc sql;

   create table reg_data(drop = temp) as

   select a.*,year(a.data) as temp,count(calculated temp) as tot_days

   from pop as a,pop as b

   where a.data= b.data

   group by a.year

   having tot_days GE 200

   order by a.data;

quit;

proc sql;

   create table reg_data as

   select a.*

   from reg_data as a right join sample as b

   on a.firmid = b.firmid;

quit;

/*In one step*/

proc sql;

   create table reg_data(drop = temp) as

   select c.*,year(a.data) as temp,count(calculated temp) as tot_days

   from pop as a        left join

        pop as b

   on a.data = b.data right join

        sample as c

   on a.firmid = c.firmid

   group by a.year

   having tot_days GE 200

   order by a.data;

quit;

-Urvish

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 449 views
  • 0 likes
  • 3 in conversation