- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am reading a large 3GB csv file in sas. All variables are numeric except for one - DATE, which is a date in DD/MM/YYYY format. I use
PROC IMPORT OUT= WORK.achat
DATAFILE= "D:\products.csv"
DBMS=DLM REPLACE;
DELIMITER='3B'x;
GETNAMES=YES;
DATAROW=2;
RUN;
I noticed that some of the DATE values are missing in the SAS imported WORK.achat file (I can't open the source file = it's too big, but I am sure there is no missingness in the source). The log concerning this var reads
informat DATE mmddyy10. ;
Here's a piece of the error msg:
NOTE: Invalid data for DATE in line 428723 50-59.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
428723 201601;7;02;201602019127800;2016020017134;2222222;17/01/2016;43;1;245;5400;1.0;1.03;13220
89 8;1;1.39;0.164;164.0;1000.0;100002 122
VAR1=201601VAR2=7 VAR3=2 VAR4=2.0160202 VAR5=2.01602 VAR6=2222222
DATE=. VAR7=43 VAR8=1 VAR9=245 VAR10=5400 VAR11=1 VAR12=1.03
VAR13=132208 VAR14=1 VAR15=1.39 VAR16=0.164 VAR17=164 VAR18=1000 VAR19=100002
_ERROR_=1 _N_=428722
I assume dates with day>12 were not recognized. Can someone please help me to enforce a ddmmyy format or change the default mmddyy10. when importing?
Thank you very much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So PROC IMPORT guessed that the field had date values in month day year order. But the record in question has 17 for the month value. That is impossible.
Why are you running a guessing procedure like PROC IMPORT on a 3 Gbyte file?
Just write your own data step to read the file and you can define the columns as they should be.
You might need to read the date strings as character and build your own logic to figure out what date they are trying to represent.
You could try using the ANYDTDTE informat and let it make a guess for each value what date it represents, but you might not like its result for ambiguous values like 10/12/2018. Is that Dec 10th or Oct 12th?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your help. Used GUESSINGROWS=500000. Took care of the problem. Thx much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@GSK1 wrote:
I am reading a large 3GB csv file in sas. All variables are numeric except for one - DATE, which is a date in DD/MM/YYYY format. I use
PROC IMPORT OUT= WORK.achat
DATAFILE= "D:\products.csv"
DBMS=DLM REPLACE;
DELIMITER='3B'x;
GETNAMES=YES;
DATAROW=2;
RUN;
I noticed that some of the DATE values are missing in the SAS imported WORK.achat file (I can't open the source file = it's too big, but I am sure there is no missingness in the source). The log concerning this var reads
informat DATE mmddyy10. ;
Here's a piece of the error msg:
NOTE: Invalid data for DATE in line 428723 50-59.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
428723 201601;7;02;201602019127800;2016020017134;2222222;17/01/2016;43;1;245;5400;1.0;1.03;13220
89 8;1;1.39;0.164;164.0;1000.0;100002 122
VAR1=201601VAR2=7 VAR3=2 VAR4=2.0160202 VAR5=2.01602 VAR6=2222222
DATE=. VAR7=43 VAR8=1 VAR9=245 VAR10=5400 VAR11=1 VAR12=1.03
VAR13=132208 VAR14=1 VAR15=1.39 VAR16=0.164 VAR17=164 VAR18=1000 VAR19=100002
_ERROR_=1 _N_=428722
I assume dates with day>12 were not recognized. Can someone please help me to enforce a ddmmyy format or change the default mmddyy10. when importing?
Thank you very much.
The Import procedure guesses formats and apparently none of the first few rows had a first value of the date larger than 12 so it guessed the format was MM/DD/YYYY. Would there have been some reason that the first 428722 lines of data had day values of 01 - 12 AND month values of 01 - 12 so that you wouldn't have errors from the MM/DD/YYYY format used by proc import?
Generally I would say if happy with the rest of the data add a GUESSINGROWS=30000 to the proc import code. It will examine more rows before guessing a format. Your data didn't have an issue until row 428723 of the data (_ERROR_=1 in the log) so that wouldn't help in this case.
If you are sure that all of the dates are actually supposed to be DD/MM/YYYY then copy the code generated by the proc import, paste into the editor and change the informat for the variable, save and run the datastep. But if you have mixed data layouts you'll have to do more work to parse the value in a data step.
You may just have a few bad values in the data file. I worked with one data set that had a value of 77 November for a date...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for the idea! I always use GUESSINGROWS = some bug number when reading character vars, but did not think of it in this case! Used GUESSINGROWS=500000 - fixed all my problems. Thanks a ton!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
If you were NOT using PROC IMPORT, but instead were using a DATA step program to read the file, then you could control the DATESTYLE using a system option, as discussed here: https://blogs.sas.com/content/iml/2016/11/11/anydtdte-informat-read-any-date-sas.html .
However, I do not think that DATESTYLE will have an impact on PROC IMPORT (that would be a question for Tech Support).
The Blog post shows using ANYDTDTE. However, you can take full control of the informat for Date variables if you write a DATA step program instead of PROC IMPORT.
Here's an example:
Cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is a great idea. Think I use this for the future cases. Thank you much.