BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I am using an INFILE statement to try and import a dataset, however I keep on getting this error for multiple lines:

 

 

NOTE: Invalid data for CA_ENGLISH in line 21 320-326.
NOTE: Invalid data for CA_DOC in line 21 334-334.

I have quadruple checked my code and while there are way more variables this is the INFILE code I am using to import my dataset:

 

 

data BNDADOS;
	infile "xxxxx"
	delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2;
	informat CA_ENGLISG best32. ;
        format CA_ENGLISG best12. ;
        input
           CA_ENGLISH ;
run;

I read online that the above error means that there is something wrong in line 21 for variable CA_ENGLISH and CA_DOC. But I have checked this line in my oroginal CSV file and there is absolutely nothing wrong for that record line OR for either of those variables.

 

 

I also don't really understand the 320-326 part of the error. I read online that this means this is the column in my CSV file where the error is occurring, but in my csv file I only have about 250 columns!

 

The other weird thing is that if I do a regular PROC IMPORT, everythign works just fine. but I can't use a proc import b/c I am going to append 4 datasets and using an INFILE helps me set the type and length of all variabels.

 

Someone please help me I am so stumped!

 

1 ACCEPTED SOLUTION

Accepted Solutions
christinagting0
Quartz | Level 8

I just want to let everyone know that I couldn't figure out what was wrong. I know some of you made suggestions, but I honestly combed through everything and couldn't figure it out.

 

I was able to append 3 out of my 4 datasets, but it was my final dataset that was causing me troubles.

 

What I ended up doing was importing my 4th data set using a proc import (which for some reason cause me no problems at all versus using the INFILE statement) and then I proc sql appended the 4th dataset to my other final dataset. There were some variables that were of different types and for these a converted the to the appropriate types-there were only about 6-7 variables that were not meshing properly.

 

I honestly have no idea what the iNFILE statement would have worked for the other 3 datasets but not for the 4th as they were from the same sources and of the same data...but if anyone else comes across this problem try what I did as a last resort I suppose

View solution in original post

15 REPLIES 15
christinagting0
Quartz | Level 8

The minimum record length was 287.
The maximum record length was 1666

 

What does this mean?

Kurt_Bremser
Super User

@christinagting0 wrote:

The minimum record length was 287.
The maximum record length was 1666

 

What does this mean?


That the longest line contained 1666 characters, and your statement that you don't read more that 250 columns was wrong.

ballardw
Super User

If you posted more of the log we could show you how to read the log and find out which character values are appearing in your numeric field.

The 320-326 are the print columns, one per character, not the "data columns" as Excel would display, in the file with the problem data.

From one of my data sets:

NOTE: Invalid data for Birth in line 680 19-28.
NOTE: Invalid argument to function YRDIF at line 159 column 19.
680       121039,9058,636,1,08/33/1988,83263,1,1,7/1/2016,1,3,3,2,2,,,,,
      89
     177
     265
     353
     441
infilename=d:\data\IPP\data\Text Data\2016\D6_STD Data_2016 3 skip=0 Client=121039 SITE=9058
CLINICIAN=636 SEX=1 Birth=. ZIP=83263 ETHNIC=1 ORACE=1 DatVisit=07/01/2016 PROGRAM_AREA=1
INSURANCE=3 SSPEC=3 CTResult=2 GCResult=2 SyResult=  age=. ctage=1 District=6 RACE=1 racea=1
raceb=  racec=  raced=  racee=  racef=  _ERROR_=1
_INFILE_=121039,9058,636,1,08/33/1988,83263,1,1,7/1/2016,1,3,3,2,2,,,,, _N_=679

The 680 on the left shows the line in the file, the following 89, 177, 265, 353, and 441 show the Number(column) of the first character shown to the right. The _infile_ shows more of the line. This specific says that my variable Birth, which expects a date, has a problem with the value beginning at postion 19 through 28 in the line. These are the characters 08/33/1988. And since August does not have 33 days that is an error for the informat using a date value of mmddyy10.

 

christinagting0
Quartz | Level 8

Thanks @ballardw! Your explanations are always so helpful to me. 

 

question about your response. You said " the following 89, 177, 265, 353, and 441 show the Number(column) of the first character shown to the right." Can you please explain this a bit. I don't think I understand. What do you mean by number and column of the first character?

 

Also, I really don't think it's a simple case of a character field in a numerical variable or an incorrect date field like Aug 33rd. I have thoroughly scanned my original CSV file for the variables that I am getting the error for and they are definitely all the right type of data. 

 

 

Reeza
Super User

We need the import code that goes along with this log. It specifically tells you to look at line 21 of your data and the variable CA_AGE.

What value would you expect, since you say all are valid? 

My guess is missing?

 

You should switch MISSOVER to TRUNCOVER

christinagting0
Quartz | Level 8

@Reeza Here is my import code!

 

for CA_AGE I would expect numerical values and I checked my original CSV and they definitely are all numerical. Also, in my INFILE code you can see that I made the informat best32., format best12 and for input did not put a character symbol.

 

help!

Reeza
Super User

Your input or format statements don't have CA_AGE.

ballardw
Super User

@christinagting0 wrote:

@Reeza Here is my import code!

 

It is incredibly long.

 

for CA_AGE I would expect numerical values and I checked my original CSV and they definitely are all numerical. Also, in my INFILE code you can see that I made the informat best32., format best12 and for input did not put a character symbol.

 

help!


I didn't see a variable CA_AGE on the INPUT statement.

 

Here's a suggestion: clear the log and rerun the data step. Save the entire log to a text file (txt extension helps) and attach the file. We are getting things piecemeal and incomplete. If the data is not too sensitive attaching a file with the first 30 lines might help as well.

 

Also, if you opened the file with Excel or other spreadsheet to check the values that does miss things. And sometimes things that look like numbers in a spreadsheet are actually text but not obviously so. That is one of the reasons you will find many comments about spreadsheets in general being poor choices for data interchange.

christinagting0
Quartz | Level 8

Thanks, 

 

I'll have to try again on Monday. thanks for the tips!

Kurt_Bremser
Super User

In the one example you posted (and probably edited away in the meantime, pulled it from the notification email I got), 

the 303-304 points to the "BN" in "BN 304". So I guess that is where you try to read numerical data into CA_AGE.

christinagting0
Quartz | Level 8

I just want to let everyone know that I couldn't figure out what was wrong. I know some of you made suggestions, but I honestly combed through everything and couldn't figure it out.

 

I was able to append 3 out of my 4 datasets, but it was my final dataset that was causing me troubles.

 

What I ended up doing was importing my 4th data set using a proc import (which for some reason cause me no problems at all versus using the INFILE statement) and then I proc sql appended the 4th dataset to my other final dataset. There were some variables that were of different types and for these a converted the to the appropriate types-there were only about 6-7 variables that were not meshing properly.

 

I honestly have no idea what the iNFILE statement would have worked for the other 3 datasets but not for the 4th as they were from the same sources and of the same data...but if anyone else comes across this problem try what I did as a last resort I suppose

suraj
Obsidian | Level 7

 

Please help me with this:

I am getting below error while submitting this code. I have a txt file with delimeter "|" and One coloum Names has names like 

 

Suraj, Singhal

James, Garg

 

I want to import this data without proc import data.

 

data New;
infile '/C:/temp/test23.csv' dlm="|" firstobs=5 ;
input name $ M_Name $ Type $ Month $ Year $ MetCount AppCount PercentMet;
run;


NOTE: Invalid data for MetCount in line 14 1-170.
NOTE: Invalid data for AppCount in line 15 1-171.
NOTE: Invalid data for PercentMet in line 16 1-195.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+---

NOTE: Invalid data errors for file ''/C:/temp/test23.csv'' occurred
outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.

 

Thanks!

Kurt_Bremser
Super User

@suraj wrote:

 

Please help me with this:

I am getting below error while submitting this code. I have a txt file with delimeter "|" and One coloum Names has names like 

 

Suraj, Singhal

James, Garg

 

I want to import this data without proc import data.

 

data New;
infile '/C:/temp/test23.csv' dlm="|" firstobs=5 ;
input name $ M_Name $ Type $ Month $ Year $ MetCount AppCount PercentMet;
run;


NOTE: Invalid data for MetCount in line 14 1-170.
NOTE: Invalid data for AppCount in line 15 1-171.
NOTE: Invalid data for PercentMet in line 16 1-195.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+---

NOTE: Invalid data errors for file ''/C:/temp/test23.csv'' occurred
outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.

 

Thanks!



a) don't hijack other people's threads, especially those that are marked as "solved". Start your own.

b) post the complete log of a failing step. Use the {i} icon to preserve formatting.

c) when any kind of import from a text file (proc import, data step with infile) fails, post the relevant part of trhe text file. Use the {i} icon to preserve text formatting.

 

My first thought is that you have missing values for your numerical variables, and SAS tries to read them from the following line. Add the truncover option to the infile statement as a first try.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 27338 views
  • 4 likes
  • 5 in conversation