BookmarkSubscribeRSS Feed
todd8325
Calcite | Level 5

Has anyone written efficient code for something similar to this?

I have a dataset containing  timestamp ( SAS datetime) ,  ID1 and ID2.   Would like to calculate the number of distinct ID2's observed for ID1 for each observation within a certain timeframe.  So for each observation, I have to count the distinct ID2's ( not including the current observation ) that occurred within say 7 days ( to the second ) for the ID1 on the transaction.

Thanks in advance for your suggestions.

7 REPLIES 7
ballardw
Super User

You should post some example data and expected results.

reneeharper
SAS Employee

I'm going to move your post to a community where you may get more attention. 

Patrick
Opal | Level 21

What is "efficient code"? Is this about not using a lot of lines of code, or code which is easy to understand and maintain or is this code which performs well?

Please give us always the SAS version and the OS.

I agree with : Could you please provide some representative sample data (a data step creating such sample data) and then tell us how the result should look like?

If it is about performance then please tell us also what volumes we're dealing with, how often this needs to run etc. - and if it is "big data" then we would also need to understand your environment (memory available, I/0, is this a grid environment, ....).

todd8325
Calcite | Level 5

Hi Patrick,

I am dealing with big data ( and sensitive data ) , so I can't post any actual data here, running 9.3 on Solaris.   I think I have come up with a solution that works, but I wanted to see if this was a problem that someone had already solved previously, and a more compact solution was available.  Sample data would be something like

ID1     ID2     DTTIME                         7D_COUNT
1        1        03aug2014:11:52:57        0

1        2        03aug2014:13:48:47        1

1        2        10aug2014:11:44:58        2

1        2        10aug2014:13:20:51        1

Thanks

Todd

Patrick
Opal | Level 21

Is your source data in SAS tables or a data base. And if it's a data base: Which one?

If it's SAS tables: Can we assume that you're data is pre-sorted like in your data sample? Or at least that for the same ID dates are in sequence (even if there are records in-between from other ID's)?

Reeza
Super User

Sounds like a readmission problem, though the exact business context may different.

Hopefully the appropriate term may at least help you narrow down solutions.

jakarman
Barite | Level 11

Same question as Patrick as you set some requirements mentioning "big data".
- How many id1's do you have. Within that, how many id2's

- How big is this dataset.

- Is it ordered  id1 id2 date? can it get ordered (sorted) seen as a stream
- What is you machine capacity  memory processing  dasd.

- Which SAS version are you using (9.3  9,4 or an old one like 9.1)

Those are the things for choosing memory based processing (hash) or batch sequential or indexed data.

How fast should the processing be. order of seconds-minutes hours or days?


The first impression would be using two record pointers with a dataset when this dataset is sorted like this and is a SAS dataset.

In that case it is one pass sequential access.

---->-- ja karman --<-----

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1506 views
  • 0 likes
  • 6 in conversation