How to the solve the look solution of this problem with very very large dataset?
I have a two datasets, dataset1 and dataset2.
dataset1 has
client_id var1_var25 (so in total 26 variables)/8client_id is the key and is unique but not ordered */
1
4
6
9
2
3
dataset2 has duplicate client_ids/* obviously coz the dataset has daily balances of bank customers for the last so many months*/
client_id date daily_balances
1 14/10/2014 5000 /*some date and daily balances of clients until end of file*/
2
1
1
3
4
3
2
Q1. I need to simply to do a left join to get the daily balances in dataset1 coz i want the daily balances of only those client_id's of dataset1? The problem is that that dataset2 has over 250 million records and dataset1 has a million records. Any efficient solution with SAS 9.2 please?
Q2. I want the last balance of each client_id, just as simple as to sort and last.id after the join?
I have no clue how to manage the look up dataset1(with 1 million records) to dataset2(with 250 million records). I'd appreciate any help please.
I'm not sure if I can help you, but I think it would be useful if you could provide the community more information about your problem:
There would be options creating indexes and eventually also storing the data using the SPDE engine.
But in the end I believe that taking a "traditional" approach and sorting/merging the data will be fastest in your case and give you the most flexibility to answer multiple questions downstream.
In case you only need the latest record per customer from dataset2 then what could work is to first write ID and latest date to a hash table (1 pass through the data) and then in a second pass only select the records matching with the hash. Write these records to a new table and then sort this new table with the reduced volumes.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.