Re: Processing method

todd8325 · Posted 08-18-2014 01:02 PM

Has anyone written efficient code for something similar to this?

I have a dataset containing timestamp ( SAS datetime) , ID1 and ID2. Would like to calculate the number of distinct ID2's observed for ID1 for each observation within a certain timeframe. So for each observation, I have to count the distinct ID2's ( not including the current observation ) that occurred within say 7 days ( to the second ) for the ID1 on the transaction.

Thanks in advance for your suggestions.

ballardw · Posted 08-18-2014 04:58 PM

You should post some example data and expected results.

reneeharper · Posted 08-20-2014 09:25 AM

I'm going to move your post to a community where you may get more attention.

Patrick · Posted 08-20-2014 10:12 AM

What is "efficient code"? Is this about not using a lot of lines of code, or code which is easy to understand and maintain or is this code which performs well?

Please give us always the SAS version and the OS.

I agree with : Could you please provide some representative sample data (a data step creating such sample data) and then tell us how the result should look like?

If it is about performance then please tell us also what volumes we're dealing with, how often this needs to run etc. - and if it is "big data" then we would also need to understand your environment (memory available, I/0, is this a grid environment, ....).

todd8325 · Posted 08-20-2014 10:38 AM

Hi Patrick,

I am dealing with big data ( and sensitive data ) , so I can't post any actual data here, running 9.3 on Solaris. I think I have come up with a solution that works, but I wanted to see if this was a problem that someone had already solved previously, and a more compact solution was available. Sample data would be something like

ID1 ID2 DTTIME 7D_COUNT
1 1 03aug2014:11:52:57 0

1 2 03aug2014:13:48:47 1

1 2 10aug2014:11:44:58 2

1 2 10aug2014:13:20:51 1

Thanks

Todd

Patrick · Posted 08-20-2014 06:00 PM

Is your source data in SAS tables or a data base. And if it's a data base: Which one?

If it's SAS tables: Can we assume that you're data is pre-sorted like in your data sample? Or at least that for the same ID dates are in sequence (even if there are records in-between from other ID's)?

Reeza · Posted 08-20-2014 06:03 PM

Sounds like a readmission problem, though the exact business context may different.

Hopefully the appropriate term may at least help you narrow down solutions.

jakarman · Posted 08-21-2014 12:59 AM

Same question as Patrick as you set some requirements mentioning "big data".
- How many id1's do you have. Within that, how many id2's

- How big is this dataset.

- Is it ordered id1 id2 date? can it get ordered (sorted) seen as a stream
- What is you machine capacity memory processing dasd.

- Which SAS version are you using (9.3 9,4 or an old one like 9.1)

Those are the things for choosing memory based processing (hash) or batch sequential or indexed data.

How fast should the processing be. order of seconds-minutes hours or days?

The first impression would be using two record pointers with a dataset when this dataset is sorted like this and is a SAS dataset.

In that case it is one pass sequential access.

---->-- ja karman --<-----

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away