The trouble, at times, with "CSV" is that it isn't really a strong standard. And Excel's handling and creation of them can be positive and negative.
But the issue it sounds like you're facing is the infamous "carriage returns in cells/fields" issue. For example, the user is typing in a cell, and hits alt-return, treating the cell almost as a cell in a Word table. This can equally happen when user paste text into a free text field in a database front end.
In essence, when the row of data or row in Excel is written to a CSV, you can end up with something like
Colc_1,Col_2,Col_3
1,123,"Some text"
2,456,"Sentence followed by a return.
Then another sentence."
3,678,"Back to well behaved"
Read these can be very tricky, especially if then then throw in having commas in the quoted text.
My fix for this, was to write a CSV remediation process, when if you told it the number of columns in the CSV, it would parse the file, line by line, and "fix" issues like the one on record 2 above, converting it to.
Colc_1,Col_2,Col_3
1,123,"Some text"
2,456,"Sentence followed by a return.<cr> Then another sentence."
3,678,"Back to well behaved"
It allowed you to substitute whatever you wanted for errant carriage returns inside quote text fields.
You do have to make certain assumptions.
1) the CSV is regular/rectangular - a fixed number of columns, with no deviation
2) text is "properly" quoted
We applied this to CSV files BEFORE attempting to ingest them using normal techniques, it wasn't a process that did the fixing as part of ingesting them. Our ingest process tracked and reconciled the CSV files, so we had "lineage/audit" on the fact that we'd altered the input files.
One further technique that's useful, you can use a combination of the Unix/Linux (or suitable capable Windows ports) commands grep, wc and uniq to detect any CSV files that are "broken" in this way, essentially you can test for each line only having the expected number of commas (allowing for commas inside quoted text) and any that have lines with more or less than the expected commas are "bad". Any with any lines with too many are structurally flawed and irredeemable. Any with any less may well be fixable, using the above process.
... View more