BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
otalog
Calcite | Level 5

Am trying to import a data set into SAS, but my date-time gives me errors, the date_time looks like this 20191202T071503+000, any help? below is the codes am using to import the dataset, kindly check the date_time portion if it looks ok.

 

 

data WORK.SCL1;
infile 'P:\DATA\SCL_1.csv\' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat RIC $12. ;
* TIP - Regardless of what SAS suggests as the length of the RIC variable increase it to above the
maximum length RIC in your dataset, which for us is 12 characters;
informat Date_Time datetime18.;
informat Type $5. ;
informat Price best32. ;
informat Volume best32. ;
informat Bid best32. ;
informat Ask best32. ;

format RIC $12. ;
format Date_Time datetime18.;


format Price best12. ;
format Volume best12. ;
format Bid best12. ;
format Ask best12. ;
input
RIC $
Date_Time

Type $
Price
Volume
Bid
Ask
;


run;
option obs=max;

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I combined my earlier code into the data step, and added a condition to detect timestamps with a time zone.

I also added one of the offending lines from your log to the csv file, and it was read correctly.

data want;
infile "~/SCL.csv" dlm="," dsd truncover firstobs=2;
input
  _ric :$6.
  domain :$15.
  _date_time :$35.
  type :$6.
  price
  volume
  bid_price
  ask_price
;
if indexc(_date_time,"+")
then do;
  if indexc(_date_time,":")
  then do;
    _date_time = cats(_date_time,":00");
  date_time = input(_date_time,e8601dz35.);
  end;
  else do;
    _date_time = cats(_date_time,"00");
    date_time = input(_date_time,b8601dz20.);
  end;
end;
else date_time = input(_date_time,e8601dt26.);
date = datepart(date_time);
time = timepart(date_time);
format
  date_time e8601dz35.6
  date yymmdd10.
  time time15.6
;
*drop _date_time;
run;

As soon as you're satisfied, uncomment the DROP statement.

 

View solution in original post

16 REPLIES 16
PaigeMiller
Diamond | Level 26

If there are ERRORs in the log, show us the log for this data step. 

 

If the results are not what you expect, show us the results you are getting and state clearly what you expect to get.

 

For your future benefit, do state there are errors without further explanation, and without showing us the information requested above.

--
Paige Miller
otalog
Calcite | Level 5
Please thank you for the reply, this is the error am getting

NOTE: Invalid data for Date_Time in line 3329 21-52.
3329 0DZC.L,Market Price,2020-03-31T08:00:12.274400869+01,Quote,,,13.52,14.21 72
PaigeMiller
Diamond | Level 26

I asked to see the log, not two lines of the log.

 

Please show the entire log for this DATA step, starting with the DATA command, and continuing down to the last NOTE: after the DATA step.

 

Please format the log proprely, by copying it as text and pasting it into the window that appears when you click on the </> icon.

Insert Log Icon in SAS Communities.png

 

 

--
Paige Miller
otalog
Calcite | Level 5
Here you are please


16 %LET DIRECTORY = P:\;
17 libname MM "P:\DATA";
NOTE: Libref MM was successfully assigned as follows:
Engine: V9
Physical Name: P:\DATA

18 PROC IMPORT DATAFILE= "P:\DATA\S_CL.csv"
19 OUT= outdata
20 DBMS=csv
21 REPLACE;
22 GETNAMES=YES;
23 RUN;

24 /**********************************************************************
25 * PRODUCT: SAS
26 * VERSION: 9.4
27 * CREATOR: External File Interface
28 * DATE: 09DEC21
29 * DESC: Generated SAS Datastep Code
30 * TEMPLATE SOURCE: (None Specified.)
31 ***********************************************************************/
32 data WORK.OUTDATA ;
33 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
34 infile 'P:\DATA\S_CL.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
35 informat _RIC $6. ;
36 informat Domain $12. ;
37 informat Date_Time B8601DZ35. ;
38 informat Type $5. ;
39 informat Price best32. ;
40 informat Volume best32. ;
41 informat Bid_Price best32. ;
42 informat Ask_Price best32. ;
43 format _RIC $6. ;
44 format Domain $12. ;
45 format Date_Time B8601DZ35. ;
46 format Type $5. ;
47 format Price best12. ;
48 format Volume best12. ;
49 format Bid_Price best12. ;
50 format Ask_Price best12. ;
51 input
52 _RIC $
53 Domain $
54 Date_Time
55 Type $
56 Price
57 Volume
58 Bid_Price
59 Ask_Price
60 ;
61 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
62 run;

NOTE: The infile 'P:\DATA\S_CL.csv' is:
Filename=P:\DATA\S_CL.csv,
RECFM=V,LRECL=32767,
File Size (bytes)=18726984184,
Last Modified=30 November 2021 10:06:37,
Create Time=30 November 2021 10:11:38

NOTE: Invalid data for Date_Time in line 3311 21-52.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+
3311 0DZC.L,Market Price,2020-03-30T07:51:07.604811288+01,Quote,,,13.17, 67
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.17
Ask_Price=. _ERROR_=1 _N_=3310
NOTE: Invalid data for Date_Time in line 3312 21-52.
3312 0DZC.L,Market Price,2020-03-30T07:51:07.604811288+01,Quote,,,13.17,13.84 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.17
Ask_Price=13.84 _ERROR_=1 _N_=3311
NOTE: Invalid data for Date_Time in line 3313 21-52.
3313 0DZC.L,Market Price,2020-03-30T07:52:21.724184834+01,Quote,,,13.65,13.84 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.65
Ask_Price=13.84 _ERROR_=1 _N_=3312
NOTE: Invalid data for Date_Time in line 3314 21-52.
3314 0DZC.L,Market Price,2020-03-30T07:52:21.724205258+01,Quote,,,13.65,14.35 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.65
Ask_Price=14.35 _ERROR_=1 _N_=3313
NOTE: Invalid data for Date_Time in line 3315 21-52.
3315 0DZC.L,Market Price,2020-03-30T08:07:00.316015099+01,Quote,,,13.74,14.35 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.74
Ask_Price=14.35 _ERROR_=1 _N_=3314
NOTE: Invalid data for Date_Time in line 3316 21-52.
3316 0DZC.L,Market Price,2020-03-30T08:07:00.316015099+01,Quote,,,13.74,14.44 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.74
Ask_Price=14.44 _ERROR_=1 _N_=3315
NOTE: Invalid data for Date_Time in line 3317 21-52.
3317 0DZC.L,Market Price,2020-03-30T13:57:29.723498030+01,Trade,14.26,122,, 70
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Trade Price=14.26 Volume=122 Bid_Price=.
Ask_Price=. _ERROR_=1 _N_=3316
NOTE: Invalid data for Date_Time in line 3318 21-52.
3318 0DZC.L,Market Price,2020-03-30T13:58:26.207018560+01,Trade,14.26,287,, 70
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Trade Price=14.26 Volume=287 Bid_Price=.
Ask_Price=. _ERROR_=1 _N_=3317
NOTE: Invalid data for Date_Time in line 3319 21-52.
3319 0DZC.L,Market Price,2020-03-30T16:40:11.345445606+01,Quote,,,,14.44 67
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=.
Ask_Price=14.44 _ERROR_=1 _N_=3318
NOTE: Invalid data for Date_Time in line 3320 21-52.
3320 0DZC.L,Market Price,2020-03-30T16:59:21.880012385+01,Trade,13.94,234,, 70
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Trade Price=13.94 Volume=234 Bid_Price=.
Ask_Price=. _ERROR_=1 _N_=3319
NOTE: Invalid data for Date_Time in line 3321 21-52.
3321 0DZC.L,Market Price,2020-03-30T17:00:34.911247030+01,Trade,13.90807,88,, 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Trade Price=13.90807 Volume=88 Bid_Price=.
Ask_Price=. _ERROR_=1 _N_=3320
NOTE: Invalid data for Date_Time in line 3322 21-52.
3322 0DZC.L,Market Price,2020-03-30T17:32:38.860150053+01,Trade,13.94,766,, 70
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Trade Price=13.94 Volume=766 Bid_Price=.
Ask_Price=. _ERROR_=1 _N_=3321
NOTE: Invalid data for Date_Time in line 3323 21-52.
3323 0DZC.L,Market Price,2020-03-30T17:35:03.372391914+01,Trade,13.89509,4912,, 74
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Trade Price=13.89509 Volume=4912 Bid_Price=.
Ask_Price=. _ERROR_=1 _N_=3322
NOTE: Invalid data for Date_Time in line 3324 21-52.
3324 0DZC.L,Market Price,2020-03-31T07:51:08.592643908+01,Quote,,,13.17, 67
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.17
Ask_Price=. _ERROR_=1 _N_=3323
NOTE: Invalid data for Date_Time in line 3325 21-52.
3325 0DZC.L,Market Price,2020-03-31T07:51:08.592927144+01,Quote,,,13.17,13.84 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.17
Ask_Price=13.84 _ERROR_=1 _N_=3324
NOTE: Invalid data for Date_Time in line 3326 21-52.
3326 0DZC.L,Market Price,2020-03-31T07:59:12.134650097+01,Quote,,,13.38,13.84 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.38
Ask_Price=13.84 _ERROR_=1 _N_=3325
NOTE: Invalid data for Date_Time in line 3327 21-52.
3327 0DZC.L,Market Price,2020-03-31T07:59:12.135065271+01,Quote,,,13.38,14.06 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.38
Ask_Price=14.06 _ERROR_=1 _N_=3326
NOTE: Invalid data for Date_Time in line 3328 21-52.
3328 0DZC.L,Market Price,2020-03-31T08:00:12.274400869+01,Quote,,,13.52,14.06 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.52
Ask_Price=14.06 _ERROR_=1 _N_=3327
NOTE: Invalid data for Date_Time in line 3329 21-52.
3329 0DZC.L,Market Price,2020-03-31T08:00:12.274400869+01,Quote,,,13.52,14.21 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.52
Ask_Price=14.21 _ERROR_=1 _N_=3328
NOTE: Invalid data for Date_Time in line 3330 21-52.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
3330 0DZC.L,Market Price,2020-03-31T08:00:18.942175117+01,Quote,,,13.71,14.21 72
_RIC=0DZC.L Domain=Market Price Date_Time=. Type=Quote Price=. Volume=. Bid_Price=13.71
Ask_Price=14.21 _ERROR_=1 _N_=3329
NOTE: 255976938 records were read from the infile 'P:\DATA\S_CL.csv'.
The minimum record length was 61.
The maximum record length was 78.
NOTE: The data set WORK.OUTDATA has 255976938 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 24:52.00
cpu time 6:43.98


Errors detected in submitted DATA step. Examine log.
255976938 rows created in WORK.OUTDATA from P:\DATA\S_CL.csv.



ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 24:52.67
cpu time 6:44.21
Tom
Super User Tom
Super User

By "import" do you mean read?  Your posted code looks like the way that PROC IMPORT writes the data steps it generates, but I doubt that PROC IMPORT would decide to use the DATETIME informat to read strings that look like either of these

20191202T071503+000
2020-03-31T08:00:12.274400869+01

Both of those will cause trouble for SAS as neither is considered valid the the corresponding B8601DZ or E8601DZ informat.

The first needs a four digit time offset instead of 3.  The second needs a colon in the time offset.

20191202T071503+0000
2020-03-31T08:00:12.274400869+01:00

You might want to read the value into a character string and then parse it into date, time and offset values using T and +/- as delimiters which you can then convert to DATE, TIME and TIME values independently.

 

ballardw
Super User

From Informat documentation:(emphasis added)

 

Use + for time zones east of the zero meridian, and use – for time zones west of the zero meridian. For example, +0200 indicates a two-hour time difference to the east of the zero meridian, and –0600 indicates a six–hour time difference to the west of the zero meridian.
Restriction: The shorter form +|–hh is not supported.
otalog
Calcite | Level 5

please how do I  read the value into a character string and then parse it into date, time?

Kurt_Bremser
Super User

Here's an example how both of your questionable strings can be read successfully:

data want;
infile datalines dlm="," dsd truncover;
input _date_time :$40.;
if indexc(_date_time,":")
then do;
  _date_time = cats(_date_time,":00");
  date_time = input(_date_time,e8601dz35.);
end;
else do;
  _date_time = cats(_date_time,"00");
  date_time = input(_date_time,b8601dz20.);
end;
format date_time e8601dz35.6;
drop _date_time;
datalines;
20191202T071503+000
2020-03-31T08:00:12.274400869+01
;

You can also see how all times are converted to UTC.

 

otalog
Calcite | Level 5

Thanks Kurt for the reply, the codes work perfectly. But I have the date-time column to be part of other variables and I am trying to read them together. Please, I have attached a sample of csv file if you could kindly look at it and help me read them correctly. My ultimate goal is to split the date and time into separate columns in the data set.

 

 

Thanks

Kurt_Bremser
Super User

I copied your file as is to my ODA, and ran this code successfully:

data want;
infile "~/SCL.csv" dlm="," dsd truncover firstobs=2;
input
  _ric :$6.
  domain :$15.
  date_time :e8601dz29.
  type :$6.
  price
  volume
  bid_price
  ask_price
;
date = datepart(date_time);
time = timepart(date_time);
format
  date_time e8601dt26.6
  date yymmdd10.
  time time15.6
;
run;
otalog
Calcite | Level 5

Thanks a lot for the reply, but after reading the data, most of the data points are missing. Kindly see the log attached.

Kurt_Bremser
Super User

I combined my earlier code into the data step, and added a condition to detect timestamps with a time zone.

I also added one of the offending lines from your log to the csv file, and it was read correctly.

data want;
infile "~/SCL.csv" dlm="," dsd truncover firstobs=2;
input
  _ric :$6.
  domain :$15.
  _date_time :$35.
  type :$6.
  price
  volume
  bid_price
  ask_price
;
if indexc(_date_time,"+")
then do;
  if indexc(_date_time,":")
  then do;
    _date_time = cats(_date_time,":00");
  date_time = input(_date_time,e8601dz35.);
  end;
  else do;
    _date_time = cats(_date_time,"00");
    date_time = input(_date_time,b8601dz20.);
  end;
end;
else date_time = input(_date_time,e8601dt26.);
date = datepart(date_time);
time = timepart(date_time);
format
  date_time e8601dz35.6
  date yymmdd10.
  time time15.6
;
*drop _date_time;
run;

As soon as you're satisfied, uncomment the DROP statement.

 

otalog
Calcite | Level 5

Thanks alot, really appreciate, it worked!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 3602 views
  • 1 like
  • 5 in conversation