BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wschnell1
Calcite | Level 5

Hi,


I have the following problem I have been struggling with. I have two data sets which I have merged. One is a set of historical pricing data and the other is the trading data for an individual.


I'll start by describing the data:


CUSIP - This is the stock id

Date     - I have daily historical pricing data for each cusip (i.e. stock).

Price    - This is the corresponding historical price for that date and stock

Shares - indicates the number of shares that the trader purchases

trandate - This is the date of the purchase and should mate the date

subsample - This is an indicator variable that is 1 for all  days when date =trandate (i.e. this is when the trader traded)



     cusip           date           price      shares      prices      trandate      subsample

1      222      1011985           5.15           .           .                .                          .

2      222      2011985           4.14           45           4.2           2011985        1

3      222      3011985            6.16           .           .                .                         .

4      222      4011985           20                .           .                .                         .

5      222      5011985           2.41           34           5.2           5011985         1

6      222      6011985           3.24           .                .                .                     .

7      222      7011985           5.25           .                .                .                     .

8      222      8011985           2.56           .                .                .                     .

9      222      9011985           2.25           .                .                .                     .

10     222      10011985           5.6           .                .                .                    .

11      333     1011985           3.52           .                .                .                    .

12      333      2011985           4.14           .                .                .                   .

13      333      3011985           7.23           .                .                .                   1

14      333      4011985           1.45           .                .                .                    .

15      333      5011985           5.63           .                .                .                    .

16      333      6011985           5.13           .                .                .                    .

17      333      7011985           5.25           23           5.42      7011985          1

18      333      8011985           5.12           .                 .                .                    .

19      333      9011985           2.34           .                  .                .                   .

20      333      10011985      7.65                .                .                .                   .

21      444        01011985             5.40          .                    .               .             .

22      444        02011985              6.40          .                    .               .             .


I am trying to do the following things

  • I want to create a new sub sample which remove all the data for a security when there is not a single trader transaction over the entire period (i.e. the last 4 columns never display any data) for that security. In the case of the data above this would mean keeping all the data except for the final two lines because the trader never traded security 444.

Any help is greatly appreciated!!!

1 ACCEPTED SOLUTION

Accepted Solutions
Amir
PROC Star

Hi,

If I've understood you correctly, how about:

1. Sort by cusip and descending subsample

2. Read the data using the by variables, only keeping the records if the subsample for the first.cusip was 1

3. Resort into original order if required.

Regards,

Amir.

View solution in original post

2 REPLIES 2
Amir
PROC Star

Hi,

If I've understood you correctly, how about:

1. Sort by cusip and descending subsample

2. Read the data using the by variables, only keeping the records if the subsample for the first.cusip was 1

3. Resort into original order if required.

Regards,

Amir.

Vince28_Statcan
Quartz | Level 8

Amir's approach is pretty much how I would've gone naturally.

However, there are approaches that are more difficult to implement but far more performant than double sorting if your data set is tall and/or you wiish to repeat the process for many custommer merges. If performance of repeated sorting can become an issue, just reply and I'll try to propose an alternative that should take roughly 2 sequential reads of the data in the worst case scenario which alone is faster than a single sort.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 563 views
  • 3 likes
  • 3 in conversation