Hi everyone,
I have to find the matches based on matching rule for large data set of ~40 million records and growing. I have designed to generate match codes using parallel processing(~4hrs for 60 million records for first time) and later generating match codes only for delta records in minutes. So,this reduced time significantly.
Using Clustering node, I can achieve matches but it is taking way too much time like 3-4 hours.
Any suggestions to improve performance and reduce time.
Thanks,
Rama
Hi Rama
Reclustering your entire set of records won't be efficient and the process duration will extend over time.
Once you have calculated matchcodes and clustered your initial dataset, the future updates could be added by using match codes and sql lookups which will contain the same rules as your clustering node.
Basically you try to find for your new records possible matches in your initial dataset.
Hope that helps.
Vincent
Hi Rama
Reclustering your entire set of records won't be efficient and the process duration will extend over time.
Once you have calculated matchcodes and clustered your initial dataset, the future updates could be added by using match codes and sql lookups which will contain the same rules as your clustering node.
Basically you try to find for your new records possible matches in your initial dataset.
Hope that helps.
Vincent
Thanks Vincent,
I believe it will improve performance.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.