SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

how to process incremental data in DMS

Reply
Contributor
Posts: 66

how to process incremental data in DMS

Hi,

we are working deduplication process, where in we have processed total bulk data and created cluster table in which it creates unique group of cluster id's. next time we are getting incremental data which may contain existing records as well as new records, now if we run same process it will be creating same set of clusterids which will be simialr to earlier. how to join this clusters. is that we need to process whole data  (inclding incremental)all the time? or is there any way how to process incremental data? also we are facing performqnce issue, we are fetching 10 million recs, processing match code clustering and inserting to database table, which is taking mor ethan 4 days. pleas e sugest on this.

thanks in advance

PROC Star
Posts: 1,167

Re: how to process incremental data in DMS

Posted in reply to maheshtalla

One piece that I can make a suggestion on is the insert to database part. If it is running slowly, investigate the bulk load options for whatever database you're using.

Tom

Ask a Question
Discussion stats
  • 1 reply
  • 359 views
  • 0 likes
  • 2 in conversation