DATA Step, Macro, Functions and more

Processing method

Reply
Occasional Contributor
Posts: 5

Processing method

Has anyone written efficient code for something similar to this?

I have a dataset containing  timestamp ( SAS datetime) ,  ID1 and ID2.   Would like to calculate the number of distinct ID2's observed for ID1 for each observation within a certain timeframe.  So for each observation, I have to count the distinct ID2's ( not including the current observation ) that occurred within say 7 days ( to the second ) for the ID1 on the transaction.

Thanks in advance for your suggestions.

Super User
Posts: 10,500

Re: Processing method

You should post some example data and expected results.

SAS Employee
Posts: 224

Re: Processing method

I'm going to move your post to a community where you may get more attention. 

Respected Advisor
Posts: 3,892

Re: Processing method

What is "efficient code"? Is this about not using a lot of lines of code, or code which is easy to understand and maintain or is this code which performs well?

Please give us always the SAS version and the OS.

I agree with : Could you please provide some representative sample data (a data step creating such sample data) and then tell us how the result should look like?

If it is about performance then please tell us also what volumes we're dealing with, how often this needs to run etc. - and if it is "big data" then we would also need to understand your environment (memory available, I/0, is this a grid environment, ....).

Occasional Contributor
Posts: 5

Re: Processing method

Hi Patrick,

I am dealing with big data ( and sensitive data ) , so I can't post any actual data here, running 9.3 on Solaris.   I think I have come up with a solution that works, but I wanted to see if this was a problem that someone had already solved previously, and a more compact solution was available.  Sample data would be something like

ID1     ID2     DTTIME                         7D_COUNT
1        1        03aug2014:11:52:57        0

1        2        03aug2014:13:48:47        1

1        2        10aug2014:11:44:58        2

1        2        10aug2014:13:20:51        1

Thanks

Todd

Respected Advisor
Posts: 3,892

Re: Processing method

Is your source data in SAS tables or a data base. And if it's a data base: Which one?

If it's SAS tables: Can we assume that you're data is pre-sorted like in your data sample? Or at least that for the same ID dates are in sequence (even if there are records in-between from other ID's)?

Super User
Posts: 17,831

Re: Processing method

Sounds like a readmission problem, though the exact business context may different.

Hopefully the appropriate term may at least help you narrow down solutions.

Valued Guide
Posts: 3,208

Re: Processing method

Same question as Patrick as you set some requirements mentioning "big data".
- How many id1's do you have. Within that, how many id2's

- How big is this dataset.

- Is it ordered  id1 id2 date? can it get ordered (sorted) seen as a stream
- What is you machine capacity  memory processing  dasd.

- Which SAS version are you using (9.3  9,4 or an old one like 9.1)

Those are the things for choosing memory based processing (hash) or batch sequential or indexed data.

How fast should the processing be. order of seconds-minutes hours or days?


The first impression would be using two record pointers with a dataset when this dataset is sorted like this and is a SAS dataset.

In that case it is one pass sequential access.

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 7 replies
  • 316 views
  • 0 likes
  • 6 in conversation