BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi Cynthia,

I can't import the date from cvs file into SAS data, here is the code:

data abc;
infile "/home/path/time.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat id $1.;
informat type $1.;
informat cls_dt mmddyy10.;
format id $1.;
format type $1.;
format cls_dt mmddyy10.;
input id $
type $
cls_dt
;
run;

The error reads invalid data for cls_dt.

Can you help me with the issue?
9 REPLIES 9
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Do consider that you are posting to a general forum, although Cynthia is quite an incredible individual with her time and effort and dedication to tracking posts, replies and the like.

Please provide some additional diagnostic information, please, such as adding a command:

IF _N_ LE 5 THEN PUTLOG '>DIAG>' _ALL_ / _INFILE_;

which will dump your input record to the SASLOG for the first 5 input records.

But first, consider: from the diagnostic message, can you confirm that your "date field" is in the expected input format, as you have defined with the INFORMAT statement?

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
Thank you so much Scott!

Here is my question in a general sense:when I use infile-informat-format-input to import csv files, if the last variable is numeric,it can not be loaded in,error message reads invalid data for variable. But if I add a character variable to the original csv file,immediately following the numeric variable(now the character variable becomes the last column),the numeric variable can be loaded as needed.

I really got confused at this, and want to figure out why and how to resolve this problem, as I assume it can be a common issue that many can encounter.

Thanks again for the kind help here!
Cynthia_sas
SAS Super FREQ
Hi:
It would be better if you could post some of your data, but there are some general things to check for. Consider this "test" data:
[pre]
"id","type","cls_dt"
1,a,11/15/1950
2,b,08/23/1951
3,c,23/8/1951
4,d,11/29/1984
5,e,21Mar2009
6,03/21/2009,f
[/pre]

There's something wrong with ID #3 (the date is NOT MM/DD/YYYY form). Neither is the date for ID #5. Both of those dates are in the wrong form. For ID #6, there's a different problem -- the variable values were entered in the CSV file with CLS_DT and TYPE reversed on the row.

What's going to happen when SAS tries to read this file? Some of the input records will be OK. Some of the input records will generate an error. Here's the LOG that results from trying to read the above file:
[pre]
136 data abc;
137 infile datalines delimiter = ',' MISSOVER DSD firstobs=2 ;
138 informat id $1.type $1. cls_dt mmddyy10.;
139 format id $1. type $1.cls_dt mmddyy10.;
140 input id $ type $ cls_dt ;
141 return;
142 datalines;

NOTE: Invalid data for cls_dt in line 146 5-13.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+
146 3,c,23/8/1951
id=3 type=c cls_dt=. _ERROR_=1 _N_=3
NOTE: Invalid data for cls_dt in line 148 5-13.
148 5,e,21Mar2009
id=5 type=e cls_dt=. _ERROR_=1 _N_=5
NOTE: Invalid data for cls_dt in line 149 14-14.
149 6,03/21/2009,f
id=6 type=0 cls_dt=. _ERROR_=1 _N_=6
NOTE: The data set WORK.ABC has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


150 ;
151 run;

[/pre]

You can see that I got exactly the same error message as you reported, for the rows in the data, where there was something wrong with CLS_DT. One good diagnostic technique is to visually examine your file, SAS will only report the first 20 or so errors, but you may be able to find out, by examining the log just which rows have the problem.

To fix records #3 and #5, I could go to the CSV file and re-type them to be MMDDYY format and then resave the CSV file. To fix record #6, the entire line needs to be fixed so that TYPE is where SAS expects it and CLS_DT comes after TYPE on the line.

Another way to read all the different date values would be to use the ANYDTDTE informat instead of the MMDDYY10. informat. However, nothing will help you "fix" bad record #6.

An INFORMAT statement tells SAS how to read records INTO SAS format. So, if your data does not correspond to the INFORMAT you specify, SAS has no choice but to issue an error message. The FORMAT statement tells SAS how to display SAS values when they're used by procedures, such as PROC PRINT or PROC FREQ or PROC MEANS, etc, etc. So, for example, the first date on ID #1 is 11/15/1950. This date is "internally" stored in the SAS dataset as -3334, the number of days from Jan 1, 1960. If you wanted to display that value with different DATE formats, that date value can be "formatted" a number of different ways:
15Nov1950 Date9.
11/15/1950 mmddyy10.
11/15/50 mmddyy8.
1950-11-15 yymmdd10.
November 15, 1950 worddate.

On the other hand, your program can only specify one method for how data should be read INTO SAS using an INFORMAT. I suspect the reason you're getting that error is that you have some bad data rows or records in the CSV file. It's entirely possible that not all the CLS_DT variables use the MMDDYY10. INFORMAT for the values.

Scott showed you a good debugging technique. You might read in the documentation about INFORMATs and FORMATs and "cleaning" data. If you cannot edit the data yourself, then you may only be able to report the errors to the people who maintain the file and ask them to do corrections and then send you a corrected file.

Good luck!
cynthia
deleted_user
Not applicable
Thanks a lot Cynthia.

Here is my code:

data abc;
infile "/home/path/time.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat id $1.type $1. time mmddyy10.;
format id $1. type $1. time mmddyy10.;
input id $ type $ time
;

The data time.csv file are as the follow:

id,type,time
1,a,12/20/2008
2,a,12/21/2008
3,s,12/22/2008
4,d,12/23/2008

Here is the error:

NOTE: Invalid data for time in line 2 5-15.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

2 CHAR 1,a,12/20/2008. 15
ZONE 326233233233330
NUMR 1C1C12F20F2008D
id=1 type=a time=. _ERROR_=1 _N_=1
NOTE: Invalid data for time in line 3 5-15.

3 CHAR 2,a,12/21/2008. 15
ZONE 326233233233330
NUMR 2C1C12F21F2008D
id=2 type=a time=. _ERROR_=1 _N_=2
NOTE: Invalid data for time in line 4 5-15.

4 CHAR 3,s,12/22/2008. 15
ZONE 327233233233330
NUMR 3C3C12F22F2008D
id=3 type=s time=. _ERROR_=1 _N_=3
NOTE: Invalid data for time in line 5 5-15.

5 CHAR 4,d,12/23/2008. 15
ZONE 326233233233330
NUMR 4C4C12F23F2008D
id=4 type=d time=. _ERROR_=1 _N_=4
NOTE: 4 records were read from the infile "/home/path/time.csv".

The problem still exists. If I copy the data and paste in the code after datalines,it works. But in practice, there are too many rows that I can't use datalines but need use infile from the .csv file, it results in error.

If I add one more character column after the time variable in the time.csv file, it works too,that is what I mean "the last variable is a numeric variable".

I checked all date values are in the right form,no bad data here, you can also see in the small data sample in the code above.

Can you help figure out what has led to the error?

Thanks again for your kind help!
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Your input data shows a trailing period character (note the SAS diagnostic mentions column 5-15). The problem is in your input data, not having the expected delimiter character or an end-of-record condition.

Scott Barry
SBBWorks, Inc.
data_null__
Jade | Level 19
> Your input data shows a trailing period character
> (note the SAS diagnostic mentions column 5-15).
Not a period, only displayed as period by in the CHAR portion of the implied LIST statement output. ZONE NUMR is a vertical HEX display of each character which reveals the value as '0D'x. An end of line character on some OSs

[pre]
RULE: ----+----1----+----2----+---
2 CHAR 1,a,12/20/2008. 15
ZONE 326233233233330
NUMR 1C1C12F20F2008D
[/pre]

This implies the file was created on Windows and is being read with UNIX. Unix uses a single character '0A'x as the end of record marker not '0D0A'x as is Windows. SAS provides a way to fix it TERMSTR


>The problem is in your input data, not having the
> expected delimiter character or an end-of-record
> condition.

This statement is correct and using TERMSTR should make it all good.

[pre]
Beginning in SAS 9, the INFILE statement TERMSTR= option specifies the
end-of-line character for the specified file. Use this option to read
files created on UNIX in a SAS session on Windows, and vice versa.
Valid TERMSTR= values are:

CRLF (carriage return line feed) - use TERMSTR=CRLF to
read Windows formatted files. CRLF is the default.

LF (line feed) - use TERMSTR=LF to read UNIX formatted files.

NL (new line) - Use TERMSTR=NL to read UNIX formatted files.
[pre]
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Thanks to data _null_; for info about the TERMSTR= option -- I was unfamiliar with using it.

Regards,

Scott
deleted_user
Not applicable
It works now with TERMSTR=CRLF!

Thanks so much data_null, and Scott and Cynthia. I benefit so much here on this weekend, hope you guys all the best!
deleted_user
Not applicable
Thanks Scott, can you advise how to fix it the the code?

Thank you again!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 6228 views
  • 0 likes
  • 4 in conversation