I can't import the date from cvs file into SAS data, here is the code:
infile "/home/path/time.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat id $1.;
informat type $1.;
informat cls_dt mmddyy10.;
format id $1.;
format type $1.;
format cls_dt mmddyy10.;
input id $
Here is my question in a general sense:when I use infile-informat-format-input to import csv files, if the last variable is numeric,it can not be loaded in,error message reads invalid data for variable. But if I add a character variable to the original csv file,immediately following the numeric variable(now the character variable becomes the last column),the numeric variable can be loaded as needed.
I really got confused at this, and want to figure out why and how to resolve this problem, as I assume it can be a common issue that many can encounter.
It would be better if you could post some of your data, but there are some general things to check for. Consider this "test" data:
There's something wrong with ID #3 (the date is NOT MM/DD/YYYY form). Neither is the date for ID #5. Both of those dates are in the wrong form. For ID #6, there's a different problem -- the variable values were entered in the CSV file with CLS_DT and TYPE reversed on the row.
What's going to happen when SAS tries to read this file? Some of the input records will be OK. Some of the input records will generate an error. Here's the LOG that results from trying to read the above file:
136 data abc;
137 infile datalines delimiter = ',' MISSOVER DSD firstobs=2 ;
138 informat id $1.type $1. cls_dt mmddyy10.;
139 format id $1. type $1.cls_dt mmddyy10.;
140 input id $ type $ cls_dt ;
NOTE: Invalid data for cls_dt in line 146 5-13.
id=3 type=c cls_dt=. _ERROR_=1 _N_=3
NOTE: Invalid data for cls_dt in line 148 5-13.
id=5 type=e cls_dt=. _ERROR_=1 _N_=5
NOTE: Invalid data for cls_dt in line 149 14-14.
id=6 type=0 cls_dt=. _ERROR_=1 _N_=6
NOTE: The data set WORK.ABC has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
You can see that I got exactly the same error message as you reported, for the rows in the data, where there was something wrong with CLS_DT. One good diagnostic technique is to visually examine your file, SAS will only report the first 20 or so errors, but you may be able to find out, by examining the log just which rows have the problem.
To fix records #3 and #5, I could go to the CSV file and re-type them to be MMDDYY format and then resave the CSV file. To fix record #6, the entire line needs to be fixed so that TYPE is where SAS expects it and CLS_DT comes after TYPE on the line.
Another way to read all the different date values would be to use the ANYDTDTE informat instead of the MMDDYY10. informat. However, nothing will help you "fix" bad record #6.
An INFORMAT statement tells SAS how to read records INTO SAS format. So, if your data does not correspond to the INFORMAT you specify, SAS has no choice but to issue an error message. The FORMAT statement tells SAS how to display SAS values when they're used by procedures, such as PROC PRINT or PROC FREQ or PROC MEANS, etc, etc. So, for example, the first date on ID #1 is 11/15/1950. This date is "internally" stored in the SAS dataset as -3334, the number of days from Jan 1, 1960. If you wanted to display that value with different DATE formats, that date value can be "formatted" a number of different ways:
November 15, 1950 worddate.
On the other hand, your program can only specify one method for how data should be read INTO SAS using an INFORMAT. I suspect the reason you're getting that error is that you have some bad data rows or records in the CSV file. It's entirely possible that not all the CLS_DT variables use the MMDDYY10. INFORMAT for the values.
Scott showed you a good debugging technique. You might read in the documentation about INFORMATs and FORMATs and "cleaning" data. If you cannot edit the data yourself, then you may only be able to report the errors to the people who maintain the file and ask them to do corrections and then send you a corrected file.
NOTE: Invalid data for time in line 2 5-15.
2 CHAR 1,a,12/20/2008. 15
id=1 type=a time=. _ERROR_=1 _N_=1
NOTE: Invalid data for time in line 3 5-15.
3 CHAR 2,a,12/21/2008. 15
id=2 type=a time=. _ERROR_=1 _N_=2
NOTE: Invalid data for time in line 4 5-15.
4 CHAR 3,s,12/22/2008. 15
id=3 type=s time=. _ERROR_=1 _N_=3
NOTE: Invalid data for time in line 5 5-15.
5 CHAR 4,d,12/23/2008. 15
id=4 type=d time=. _ERROR_=1 _N_=4
NOTE: 4 records were read from the infile "/home/path/time.csv".
The problem still exists. If I copy the data and paste in the code after datalines,it works. But in practice, there are too many rows that I can't use datalines but need use infile from the .csv file, it results in error.
If I add one more character column after the time variable in the time.csv file, it works too,that is what I mean "the last variable is a numeric variable".
I checked all date values are in the right form,no bad data here, you can also see in the small data sample in the code above.
Can you help figure out what has led to the error?
Your input data shows a trailing period character (note the SAS diagnostic mentions column 5-15). The problem is in your input data, not having the expected delimiter character or an end-of-record condition.
> Your input data shows a trailing period character
> (note the SAS diagnostic mentions column 5-15).
Not a period, only displayed as period by in the CHAR portion of the implied LIST statement output. ZONE NUMR is a vertical HEX display of each character which reveals the value as '0D'x. An end of line character on some OSs
This implies the file was created on Windows and is being read with UNIX. Unix uses a single character '0A'x as the end of record marker not '0D0A'x as is Windows. SAS provides a way to fix it TERMSTR
>The problem is in your input data, not having the
> expected delimiter character or an end-of-record
This statement is correct and using TERMSTR should make it all good.
Beginning in SAS 9, the INFILE statement TERMSTR= option specifies the
end-of-line character for the specified file. Use this option to read
files created on UNIX in a SAS session on Windows, and vice versa.
Valid TERMSTR= values are:
CRLF (carriage return line feed) - use TERMSTR=CRLF to
read Windows formatted files. CRLF is the default.
LF (line feed) - use TERMSTR=LF to read UNIX formatted files.
NL (new line) - Use TERMSTR=NL to read UNIX formatted files.