DATA Step, Macro, Functions and more

Weird issue with reading dates from a CSV file

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 147
Accepted Solution

Weird issue with reading dates from a CSV file

[ Edited ]

Hi all

 

Using SAS 9.4 on Windows.

 

I have also contacted SAS tech support with this issue.  But I wanted to see if anyone here knew a solution and also make people aware of what could be a real GOTCHA.

 

I have dates like this in a CSV file

 

Jan-18

 

Sep-17

Dec-17

 

Where Jan-18 represents January 2018 with the date of the month not recorded. 

 

 

But when I use PROC IMPORT

 

PROC IMPORT OUT= WORK.ruthie

            DATAFILE= "C:\personal\Consults\Ruthie Igwe\ruthie.csv"

            DBMS=CSV REPLACE;

     GETNAMES=YES;

     DATAROW=2;

RUN;

 

It gives me dates like this:

 

01JAN18:00:00:00 

01SEP17:00:00:00  01DEC17:00:00:00

 

With no notes, errors or warnings in the log.

 

Regards

 

Peter


Accepted Solutions
Solution
‎04-23-2018 05:36 PM
PROC Star
Posts: 8,117

Re: Weird issue with reading dates from a CSV file

[ Edited ]

@plf515: Obviously, a poor choice by PROC IMPORT when reading dates like 09/2017. Why they would choose anydtdtm40., rather than anydtdte7., is beyond me!

 

However, I'd just copy, paste and modify the code that PROC IMPORT shows in the log. Thus, for your data,:

data WORK.RUTHIE    ;
  infile '/folders/myfolders/ruthie.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
  informat Agecat $15. ;
  informat GPA best32. ;
  informat Any_dependents $3. ;
  informat Number_dependents best32. ;
  informat Married $3. ;
  informat Employed $12. ;
  informat HoursWork best32. ;
  informat Enroll $22. ;
  informat Start anydtdte7. ;
  informat End anydtdte7. ;
  format Agecat $15. ;
  format GPA best12. ;
  format Any_dependents $3. ;
  format Number_dependents best12. ;
  format Married $3. ;
  format Employed $12. ;
  format HoursWork best12. ;
  format Enroll $22. ;
  format Start date9. ;
  format End date9. ;
  input Agecat  $ GPA Any_dependents  $ Number_dependents
        Married  $ Employed  $ HoursWork Enroll  $
        Start End;
 run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
PROC Star
Posts: 8,117

Re: Weird issue with reading dates from a CSV file

What does Sep-17 represent? Is is 01SEP2017, 17SEP2017, or something else. If the dates don't include the year, are they all from the same year and, if so, which year?

 

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 147

Re: Weird issue with reading dates from a CSV file

Hi Art

 

I edited my post.  Jan-18 represents January of 2018, with date of month not specified.

 

Peter

Super User
Posts: 22,875

Re: Weird issue with reading dates from a CSV file

Is this from the text file or when you've opened it with Excel?


Can you include a small screenshot of the dates in the CSV file from a text editor? Excel will interpret data on its own so the CSV may have something thats causing SAS to read it as a datetime. 

Also, what does the code indicate for the informat/format for the variable. 

 

What happens if you set GUESSINGROWS=MAX and re-run it?

Frequent Contributor
Posts: 147

Re: Weird issue with reading dates from a CSV file

Hi Reeza

 

If I open the file in Notepad the dates look like this:

 

01/2018,

09/2017,12/2017

 

and these should represent an January of 2018, September of 2017 and so on.  The date of the month is unknown.

 

Thanks


Peter

Solution
‎04-23-2018 05:36 PM
PROC Star
Posts: 8,117

Re: Weird issue with reading dates from a CSV file

[ Edited ]

@plf515: Obviously, a poor choice by PROC IMPORT when reading dates like 09/2017. Why they would choose anydtdtm40., rather than anydtdte7., is beyond me!

 

However, I'd just copy, paste and modify the code that PROC IMPORT shows in the log. Thus, for your data,:

data WORK.RUTHIE    ;
  infile '/folders/myfolders/ruthie.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
  informat Agecat $15. ;
  informat GPA best32. ;
  informat Any_dependents $3. ;
  informat Number_dependents best32. ;
  informat Married $3. ;
  informat Employed $12. ;
  informat HoursWork best32. ;
  informat Enroll $22. ;
  informat Start anydtdte7. ;
  informat End anydtdte7. ;
  format Agecat $15. ;
  format GPA best12. ;
  format Any_dependents $3. ;
  format Number_dependents best12. ;
  format Married $3. ;
  format Employed $12. ;
  format HoursWork best12. ;
  format Enroll $22. ;
  format Start date9. ;
  format End date9. ;
  input Agecat  $ GPA Any_dependents  $ Number_dependents
        Married  $ Employed  $ HoursWork Enroll  $
        Start End;
 run;

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 147

Re: Weird issue with reading dates from a CSV file

Thanks!  That worked.  But what a potential gotcha!

 

Peter

Super User
Posts: 22,875

Re: Weird issue with reading dates from a CSV file

@plf515 personally, I wouldn't consider this a gotcha. It read the dates correctly, but yes, you would need to use DATEPART to extract the date. Anytime you use a guessing procedure to import data, the onus should be on the user to verify the data. You'll run into the same issues with any tool (R/SAS/Python) when not specifying types explicitly. 

 

 

Frequent Contributor
Posts: 147

Re: Weird issue with reading dates from a CSV file

But it read something with 7 characters as if it had 20.  That's a problem.

 

I think there should be at least a note that things are unclear.

 

PROC Star
Posts: 8,117

Re: Weird issue with reading dates from a CSV file

@plf515: I agree!

 

@Reeza: I'd consider it a gotcha .. at least deserving of mention in the documentation! The developers, me thinks, should at least have looked for a colon before having the proc consider a date to be a datetime.

 

However, since correcting that now would mess up any historical code that folks have written I, for one, would like to know which specific entries will be considered to be datetimes rather than dates.

 

Art, CEO, AnalystFinder.com

Super User
Posts: 13,084

Re: Weird issue with reading dates from a CSV file

I wonder if the choice to use ANYDTDTM instead of ANYDTDTE for that data format was something marginally to reduce the "guessing" code of the Proc Import algorithm. As in "if it is a bit goofy try to treat as datetime" as the dates do work and when you get time parts as well.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 284 views
  • 7 likes
  • 4 in conversation