Solved: dates and certain observations

wschnell1 · Posted 11-18-2013 10:39 PM

Hi,

I have the following problem I have been struggling with. I have two data sets which I have merged. One is a set of historical pricing data and the other is the trading data for an individual.

I'll start by describing the data:

CUSIP - This is the stock id

Date - I have daily historical pricing data for each cusip (i.e. stock).

Price - This is the corresponding historical price for that date and stock

Shares - indicates the number of shares that the trader purchases

trandate - This is the date of the purchase and should mate the date

subsample - This is an indicator variable that is 1 for all days when date =trandate (i.e. this is when the trader traded)

cusip date price shares prices trandate subsample

1 222 1011985 5.15 . . . .

2 222 2011985 4.14 45 4.2 2011985 1

3 222 3011985 6.16 . . . .

4 222 4011985 20 . . . .

5 222 5011985 2.41 34 5.2 5011985 1

6 222 6011985 3.24 . . . .

7 222 7011985 5.25 . . . .

8 222 8011985 2.56 . . . .

9 222 9011985 2.25 . . . .

10 222 10011985 5.6 . . . .

11 333 1011985 3.52 . . . .

12 333 2011985 4.14 . . . .

13 333 3011985 7.23 . . . 1

14 333 4011985 1.45 . . . .

15 333 5011985 5.63 . . . .

16 333 6011985 5.13 . . . .

17 333 7011985 5.25 23 5.42 7011985 1

18 333 8011985 5.12 . . . .

19 333 9011985 2.34 . . . .

20 333 10011985 7.65 . . . .

21 444 01011985 5.40 . . . .

22 444 02011985 6.40 . . . .

I am trying to do the following things

I want to create a new sub sample which remove all the data for a security when there is not a single trader transaction over the entire period (i.e. the last 4 columns never display any data) for that security. In the case of the data above this would mean keeping all the data except for the final two lines because the trader never traded security 444.

Any help is greatly appreciated!!!

Amir · Posted 11-19-2013 08:04 AM

Hi,

If I've understood you correctly, how about:

1. Sort by cusip and descending subsample

2. Read the data using the by variables, only keeping the records if the subsample for the first.cusip was 1

3. Resort into original order if required.

Regards,

Amir.

View solution in original post

Amir · Posted 11-19-2013 08:04 AM

Hi,

If I've understood you correctly, how about:

1. Sort by cusip and descending subsample

2. Read the data using the by variables, only keeping the records if the subsample for the first.cusip was 1

3. Resort into original order if required.

Regards,

Amir.

Vince28_Statcan · Posted 11-19-2013 08:31 AM

Amir's approach is pretty much how I would've gone naturally.

However, there are approaches that are more difficult to implement but far more performant than double sorting if your data set is tall and/or you wiish to repeat the process for many custommer merges. If performance of repeated sorting can become an issue, just reply and I'll try to propose an alternative that should take roughly 2 sequential reads of the data in the worst case scenario which alone is faster than a single sort.

dates and certain observations

Re: dates and certain observations

Re: dates and certain observations

Re: dates and certain observations

dates and certain observations

Re: dates and certain observations

Re: dates and certain observations

Re: dates and certain observations

Click image to register for webinar

Classroom Training Available!