New SAS User

Completely new to SAS or trying something new with SAS? Post here for help getting started.
BookmarkSubscribeRSS Feed
GSK1
Calcite | Level 5

I am reading a large 3GB csv file in sas. All variables are numeric except for one - DATE, which is a date in DD/MM/YYYY format. I use   

PROC IMPORT OUT= WORK.achat             

DATAFILE= "D:\products.csv"             

DBMS=DLM REPLACE;

DELIMITER='3B'x;
GETNAMES=YES;
DATAROW=2;
RUN;

 

I noticed that some of the DATE values are missing in the SAS imported WORK.achat  file (I can't open the source file = it's too big, but I am sure there is no missingness in the source). The log concerning this var reads

 

informat DATE mmddyy10. ;

 

Here's a piece of the error msg:

NOTE: Invalid data for DATE in line 428723 50-59.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
428723 201601;7;02;201602019127800;2016020017134;2222222;17/01/2016;43;1;245;5400;1.0;1.03;13220
89 8;1;1.39;0.164;164.0;1000.0;100002 122
VAR1=201601VAR2=7 VAR3=2 VAR4=2.0160202 VAR5=2.01602 VAR6=2222222
DATE=. VAR7=43 VAR8=1 VAR9=245 VAR10=5400 VAR11=1 VAR12=1.03
VAR13=132208 VAR14=1 VAR15=1.39 VAR16=0.164 VAR17=164 VAR18=1000 VAR19=100002
_ERROR_=1 _N_=428722

 

I assume dates with day>12 were not recognized. Can someone please help me to enforce a ddmmyy format or change the default mmddyy10. when importing? 

 

Thank you very much.

6 REPLIES 6
Tom
Super User Tom
Super User

So PROC IMPORT guessed that the field had date values in month day year order.  But the record in question has 17 for the month value.  That is impossible.

 

Why are you running a guessing procedure like PROC IMPORT on a 3 Gbyte file?

Just write your own data step to read the file and you can define the columns as they should be.

 

You might need to read the date strings as character and build your own logic to figure out what date they are trying to represent.

You could try using the ANYDTDTE informat and let it make a guess for each value what date it represents, but you might not like its result for ambiguous values like 10/12/2018.  Is that Dec 10th or Oct 12th? 

GSK1
Calcite | Level 5

Thanks for your help. Used GUESSINGROWS=500000. Took care of the problem. Thx much.

ballardw
Super User

@GSK1 wrote:

I am reading a large 3GB csv file in sas. All variables are numeric except for one - DATE, which is a date in DD/MM/YYYY format. I use   

PROC IMPORT OUT= WORK.achat             

DATAFILE= "D:\products.csv"             

DBMS=DLM REPLACE;

DELIMITER='3B'x;
GETNAMES=YES;
DATAROW=2;
RUN;

 

I noticed that some of the DATE values are missing in the SAS imported WORK.achat  file (I can't open the source file = it's too big, but I am sure there is no missingness in the source). The log concerning this var reads

 

informat DATE mmddyy10. ;

 

Here's a piece of the error msg:

NOTE: Invalid data for DATE in line 428723 50-59.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
428723 201601;7;02;201602019127800;2016020017134;2222222;17/01/2016;43;1;245;5400;1.0;1.03;13220
89 8;1;1.39;0.164;164.0;1000.0;100002 122
VAR1=201601VAR2=7 VAR3=2 VAR4=2.0160202 VAR5=2.01602 VAR6=2222222
DATE=. VAR7=43 VAR8=1 VAR9=245 VAR10=5400 VAR11=1 VAR12=1.03
VAR13=132208 VAR14=1 VAR15=1.39 VAR16=0.164 VAR17=164 VAR18=1000 VAR19=100002
_ERROR_=1 _N_=428722

 

I assume dates with day>12 were not recognized. Can someone please help me to enforce a ddmmyy format or change the default mmddyy10. when importing? 

 

Thank you very much.


The Import procedure guesses formats and apparently none of the first few rows had a first value of the date larger than 12 so it guessed the format was MM/DD/YYYY. Would there have been some reason that the first 428722 lines of data had day values of 01 - 12 AND month values of 01 - 12 so that you wouldn't have errors from the MM/DD/YYYY format used by proc import?

Generally I would say if happy with the rest of the data add a GUESSINGROWS=30000 to the proc import code. It will examine more rows before guessing a format. Your data didn't have an issue until row 428723 of the data (_ERROR_=1 in the log) so that wouldn't help in this case.

 

If you are sure that all of the dates are actually supposed to be DD/MM/YYYY then copy the code generated by the proc import, paste into the editor and change the informat for the variable, save and run the datastep. But if you have mixed data layouts you'll have to do more work to parse the value in a data step.

 

You may just have a few bad values in the data file. I worked with one data set that had a value of 77 November for a date...

GSK1
Calcite | Level 5

Thank you so much for the idea! I always use GUESSINGROWS = some bug number when reading character vars, but did not think of it in this case! Used GUESSINGROWS=500000 - fixed all my problems. Thanks a ton!

Cynthia_sas
SAS Super FREQ

Hi:
If you were NOT using PROC IMPORT, but instead were using a DATA step program to read the file, then you could control the DATESTYLE using a system option, as discussed here: https://blogs.sas.com/content/iml/2016/11/11/anydtdte-informat-read-any-date-sas.html .

However, I do not think that DATESTYLE will have an impact on PROC IMPORT (that would be a question for Tech Support).

The Blog post shows using ANYDTDTE. However, you can take full control of the informat for Date variables if you write a DATA step program instead of PROC IMPORT.

Here's an example:
alt_read_csv.png
Cynthia

GSK1
Calcite | Level 5

This is a great idea. Think I use this for the future cases. Thank you much.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2217 views
  • 0 likes
  • 4 in conversation