BookmarkSubscribeRSS Feed
Jamie_H
Fluorite | Level 6

Id just like some advice and pointers on ways you would take on the following task:

 

I use SAS EG and have access to a SAS view on our warehouse.   The table contains a record for every day that an account is open, and contains various information for that account (Date, Account_ID, Outstanding_Balance, Arrears_Amount etc etc).   The original table that this comes from has new records added daily, so it is sorted in date order.  We have rows going back to 2012.

 

I need to create monthly generation files that contain information for an account where something specific has changed from the day before.  For example, if on 21st February 2017 an Account_ID has Arrears_Amount of 0 and then on the 22nd February that Account_ID has arrears amount of 50, I would want to create a variable called Arrears_Changed and set that to "Y" and then push that row of data into the generational dataset with all other changes that occurred in February 2017.

 

My challenges are that the dataset is large and not sorted by account_id and that it's a view, so efficiency is really the key here. 

 

Any suggestions greatly appreciated.  

2 REPLIES 2
RW9
Diamond | Level 26 RW9
Diamond | Level 26

If your not changing the original source then:

- Extract the required information into a dataset local to you (i.e. small part, only what you need)

- Sort and process data 

- Output to file

 

If you need to change the original data, then add that in as part of the capture process.

mkeintz
PROC Star

You saying that the original table is sorted by date (but not necessarily by account within date), is that correct?  Are you also saying that your view is not also sorted by date?  (And I take it that this is a proc sql view, not a data step view).

 

If the view is sorted by date, than a data step reading that view and using a hash object can easily track the latest_balance by account for each incoming row/observation - i.e. one hash "dataitem" per account.    One can also do this with an unsorted view as well, but the hash object will take a lot more memory, depending on the number of records for each account.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 941 views
  • 0 likes
  • 3 in conversation