There should be an option in the Sort Task to remove duicates as well as a DISTINCT option in query builder to return only unique rows.
It’s not exactly clear what your rules are for removing records so I suspect these may not do what you want. They will cover the case shown. So, if you can’t figure it out it would help if you expand your example to be closer to your actual data.
@maliksmom2000 wrote:
I have about 500 columns and 150M records. These are inspections and about half are duplicate rows for the same "unit" that has been inspected. There may be up to five or six records for some units and they have different data that needs to be kept. However there are many others that have 20 copies of the same inspections and only two of the records are different. How do I do a query builder to eliminate all but unique records?
- Unit ID will always be the same
- Inspection date, record number, etc. may be different between two records for the same unit.
- I need to keep both but I can't have 200M records for 6.4M units.
- I would assume I'd have upwards of 30M with the different inspection records for each unit.
Example:
Unit ID Inspection Date Finding Code
1 2/14 A
1 2/14 A
1 2/15 B
1 2/15 A
I would need to remove the second record from 2/14. I want distinct rows only but I need to compare all of the columns not just one for each record. If that makes sense
Help!