Please note that this is not an ERROR. Errors will stop the data step generated by proc import from running. It is a note about data invalid for a specific variable.
A quick "how to read" one of the messages:
NOTE: Invalid data for TA_TEL_NUM_1 in line 40 280-290.
This tells you the source line in the file read, Line 40. So if needed you can go to the CSV and examine it. Caution: CSV opened by spreadsheet software may alter appearance of some values and if saved from there alter the data. Also if the Guessingrows option had been set to a minimum of 40, for this file, the ******* would have been considered for setting the variable type.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
40 39,10052229,Yes,,TEL9,Asct Trace No data,Verbal New – 2 Ver,BETHSHEAN CRAIGO,,MONTROSE,ANGUS,DD109LD
101 ,1700,Movement,,,0,0,NO,No,Strong,0,0,0,0,No,0,99,No delete records,No,,No searches ever,No searches
201 ever,Associate,1,3,Some,No,0,0,0,0,No,0,No load records,No delete records,Yes,***********,No search
301 es ever,No searches ever,2,0,Last link in a chain,06042022,68070017907,,,,,,274,Associate,"MR,MARK J
401 AMES, ,HOLLOWAY",09071968,", ,",", ,",", ,",NO,NO,+0176,0,0.5 461
The RULE line is a ruler so you can find the specific columns involved. the + indicate columns 5, 15, 25 etc. The 1, 2, 3 are columns 10, 20, 30 . the 0 is column 100. The line wraps if long enough and the 101, 201 etc. Indicate the column start position of that portion of the line read.
So when you have the invalid data at column 280, you start in the line that that has 201 on the left margin and go across to the column under the 8 in the ruler to find column 280. So you can see the ******** encountered.
Compare the informat assigned for that variable:
94 informat TA_TEL_NUM_1 best32. ;
The informat is best32. or a generic numeric informat that will accept all sorts of simple numeric values.
This sort of thing, "invalid data" is common from depending on how well your source data is documented. Someone may have a document that says something like "missing values for this field will appear as ********** " (or some other character) Telephone numbers may often be 9999999999 or 0000000000 but I have seen 9990000000 0009999999 as well. Other data sources are likely to have "numeric" fields that are mostly okay but then throw in something like <0.1 or NA or UNK. So sometimes you are best off to create custom informats that will handle those special cases to provide a desired numeric result and prevent invalid data messages.
I have a few data sources where I have custom informats to read site id values because they keep adding such and the custom informat throws this invalid data message so i can question the source about the other metadata that I need to process the data.
... View more