Hi,
I have the following problem I have been struggling with. I have two data sets which I have merged. One is a set of historical pricing data and the other is the trading data for an individual.
I'll start by describing the data:
CUSIP - This is the stock id
Date - I have daily historical pricing data for each cusip (i.e. stock).
Price - This is the corresponding historical price for that date and stock
Shares - indicates the number of shares that the trader purchases
trandate - This is the date of the purchase and should mate the date
subsample - This is an indicator variable that is 1 for all days when date =trandate (i.e. this is when the trader traded)
cusip date price shares prices trandate subsample
1 222 1011985 5.15 . . . .
2 222 2011985 4.14 45 4.2 2011985 1
3 222 3011985 6.16 . . . .
4 222 4011985 20 . . . .
5 222 5011985 2.41 34 5.2 5011985 1
6 222 6011985 3.24 . . . .
7 222 7011985 5.25 . . . .
8 222 8011985 2.56 . . . .
9 222 9011985 2.25 . . . .
10 222 10011985 5.6 . . . .
11 333 1011985 3.52 . . . .
12 333 2011985 4.14 . . . .
13 333 3011985 7.23 . . . 1
14 333 4011985 1.45 . . . .
15 333 5011985 5.63 . . . .
16 333 6011985 5.13 . . . .
17 333 7011985 5.25 23 5.42 7011985 1
18 333 8011985 5.12 . . . .
19 333 9011985 2.34 . . . .
20 333 10011985 7.65 . . . .
21 444 01011985 5.40 . . . .
22 444 02011985 6.40 . . . .
I am trying to do the following things
Any help is greatly appreciated!!!
Hi,
If I've understood you correctly, how about:
1. Sort by cusip and descending subsample
2. Read the data using the by variables, only keeping the records if the subsample for the first.cusip was 1
3. Resort into original order if required.
Regards,
Amir.
Hi,
If I've understood you correctly, how about:
1. Sort by cusip and descending subsample
2. Read the data using the by variables, only keeping the records if the subsample for the first.cusip was 1
3. Resort into original order if required.
Regards,
Amir.
Amir's approach is pretty much how I would've gone naturally.
However, there are approaches that are more difficult to implement but far more performant than double sorting if your data set is tall and/or you wiish to repeat the process for many custommer merges. If performance of repeated sorting can become an issue, just reply and I'll try to propose an alternative that should take roughly 2 sequential reads of the data in the worst case scenario which alone is faster than a single sort.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.