BookmarkSubscribeRSS Feed
Autotelic
Obsidian | Level 7

I'm on SAS Enterprise Guide 7.11, SAS 9.4.

 

When trying to import a csv file, there's a line that disappears. It is the one mentioned in the error below.

image.png

(See block quote of this at the bottom of this question).

 

I've opened the file in Notepad++, but can't find anything wrong with the missing line.

Here's a screen shot placed at the missing line at column 283 (where SAS claims the error begins):

image.png

 

The code I'm using is

Data Want;
    Length
        'Nº de Operación de Venta C&C'n $ 29
        "Estado Operación"n $ 46
        "Código de Pedido"n   8
        "Estado del Pedido"n $ 46
        "Fecha de Pedido"n   8
        "Hora de Pedido"n   8
        "Referencia ECI"n $ 18
        "Descripción Referencia ECI"n $ 40
        "División"n      $ 5
        "Descripción División"n $ 20
        "Subdivisión"n   $ 7
        "Descripción Subdivisión"n $ 20
        "Método de Envío"n $ 16
        "Fecha de Compromiso Original"n   8
        "Fecha de Compromiso"n   8
        "Importe Original"n $ 10
        Importe          $ 10
        Unidades           8
        "Unidades Pedidas"n   8;
    Format
        'Nº de Operación de Venta C&C'n $CHAR29.
        "Estado Operación"n $CHAR46.
        "Código de Pedido"n BEST13.
        "Estado del Pedido"n $CHAR46.
        "Fecha de Pedido"n Date9.
        "Hora de Pedido"n TIME8.
        "Referencia ECI"n $CHAR18.
        "Descripción Referencia ECI"n $CHAR40.
        "División"n      $CHAR5.
        "Descripción División"n $CHAR20.
        "Subdivisión"n   $CHAR7.
        "Descripción Subdivisión"n $CHAR20.
        "Método de Envío"n $CHAR16.
        "Fecha de Compromiso Original"n DATE9.
        "Fecha de Compromiso"n DATE9.
        "Importe Original"n $CHAR10.
        Importe          $CHAR10.
        Unidades         Comma12.
        "Unidades Pedidas"n Comma12.;
    Informat
        'Nº de Operación de Venta C&C'n $CHAR29.
        "Estado Operación"n $CHAR46.
        "Código de Pedido"n BEST13.
        "Estado del Pedido"n $CHAR46.
        "Fecha de Pedido"n Date9.
        "Hora de Pedido"n TIME8.
        "Referencia ECI"n $CHAR18.
        "Descripción Referencia ECI"n $CHAR40.
        "División"n      $CHAR5.
        "Descripción División"n $CHAR20.
        "Subdivisión"n   $CHAR7.
        "Descripción Subdivisión"n $CHAR20.
        "Método de Envío"n $CHAR16.
        "Fecha de Compromiso Original"n DATE9.
        "Fecha de Compromiso"n DATE9.
        "Importe Original"n $CHAR10.
        Importe          $CHAR10.
        Unidades         Comma12.
        "Unidades Pedidas"n Comma12.;
    Infile "file_path_name"
        Firstobs=2
        LRECL=600
        ENCODING="UTF-16"
        TERMSTR=CRLF
        DLM=","
        MISSOVER
        DSD;
    Input
        'Nº de Operación de Venta C&C'n : $CHAR29.
        "Estado Operación"n : $CHAR46.
        "Código de Pedido"n :  BEST13.
        "Estado del Pedido"n : $CHAR46.
        "Fecha de Pedido"n :  MMDDYY10.
        "Hora de Pedido"n :  TIME5.
        "Referencia ECI"n : $CHAR18.
        "Descripción Referencia ECI"n : $CHAR40.
        "División"n      : $CHAR5.
        "Descripción División"n : $CHAR20.
        "Subdivisión"n   : $CHAR7.
        "Descripción Subdivisión"n : $CHAR20.
        "Método de Envío"n : $CHAR16.
        "Fecha de Compromiso Original"n :  YYMMDD10.
        "Fecha de Compromiso"n :  MMDDYY10.
        "Importe Original"n : $CHAR10.
        Importe          : $CHAR10.
        Unidades         :  Comma12.
        "Unidades Pedidas"n :  Comma12.;
Run;

 

 

NOTE: Invalid data for 'Unidades Pedidas'n in line 10524 283-318.
REGLA: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
10524 "00401430761020520170823225656","Cancelled","1723549001754","Cancelled","8/23/2017","22:57 ","001002
101 535973537 ","SET PULSERAS 4 ","00102","COMPLEMENTOS ","0010258","B
201 ISUTERIA ","HomeDelivery","2017-09-07","9/7/2017","4.45 €","0.00 €","1","0".."004014307610
ZONE 4555454422222222222222466646667677222333323323322232323333222323328222323328222322232002333333333333
NUMR 93545291000000000002C28FD545C965292C22017D09D072C29F7F20172C24E45002C20E00002C212C202DA2004014307610
301 20520170823225656","Cancelled","1723549001754","Cancelled","8/23/2017","22:57 ","001016687233862042"
401 ,"CMST E/RE N/C COMB. LUREX ","00109","JOVEN ELLA ","0010919","MARCAS PRO.JOV
501 EN E ","HomeDelivery","2017-09-07","9/7/2017","2.99 €","0.00 €","1","0" 572
'Nº de Operación de Venta C&C'n=00401430761020520170823225656 'Estado Operación'n=Cancelled 'Código de Pedido'n=1723549001754
'Estado del Pedido'n=Cancelled 'Fecha de Pedido'n=23AUG2017 'Hora de Pedido'n=22:57:00 'Referencia ECI'n=001002535973537
'Descripción Referencia ECI'n=SET PULSERAS 4 'División'n=00102 'Descripción División'n=COMPLEMENTOS 'Subdivisión'n=0010258
'Descripción Subdivisión'n=BISUTERIA 'Método de Envío'n=HomeDelivery 'Fecha de Compromiso Original'n=07SEP2017
'Fecha de Compromiso'n=07SEP2017 'Importe Original'n=4.45 € Importe=0.00 € Unidades=1 'Unidades Pedidas'n=. _ERROR_=1 _N_=10523
38 REPLIES 38
Ksharp
Super User

If you are using PROC IMPORT , try option  GUESSINGROW=max  .

ballardw
Super User

And post the code used to attempt that read.

The error you are getting is usually related to the informat associated with a variable. HOW that informat is specified can cause issues.

 

Also your picture shows a small vertical purple line. Did you add that? If not, it is likely whatever that represents is the issue.

Autotelic
Obsidian | Level 7

I added the code. The vertical purple line isn't a character, it's the spot where I placed the cursor.
Thanks!

Tom
Super User Tom
Super User

Looking at the line that it is failing on it is show way more that 19 items on one line.

It is also show end of line mark ('0D0A'x) in the middle of line.

Are you sure that data file is using UTF-16 encoding?  If so it should NOT have '0D0A'x as that is an invalid 16 bit code.

Autotelic
Obsidian | Level 7
How do you identify the end of line mark? I can't find it. But, judging by the log and my knowledge of the data, it should be the two consecutive full stops in the line above "ZONE 4555454422222222222222466646667677222333323323322232323333222323328222323328222322232002333333333333".
No, I'm not sure about the encoding. I can't remember at the moment (and also can't check), but I think I copied the encoding from what SAS EG chose when I manually imported the file. Also, Notepad++ seems to support this.
Is there a way to get an "official" answer to what the encoding is?
TomKari
Onyx | Level 15

At the circled point in the attached, instead of the "0" being followed by a comma, it is being followed by the hex characters "2080". Because of the lack of a comma, SAS is interpreting everything to the right as part of the field.

 

You need to get those two hex characters out of there.

 

TomFlagged.png

Autotelic
Obsidian | Level 7
But at the point a line ends, it's supposed to be a "CRLF" and that's what I see in Notepad++. Am I misinterpreting something here?
Tom
Super User Tom
Super User

How many lines does notepad+ see in the file?

Now count how many lines SAS sees if you don't add anything to the INFILE statement.

data _null_;
  infile "file_path_name";
  input;
run;

If they are the same then leave most of those other things off the INFILE statement.  If you are running Unix and the carriage returns mess up the last field there are other ways to deal with it.

data test;
  infile "file_path_name" dsd truncover firstobs=2;
  length var1-var19 $50 ;
  input var1-var19 ;
run;

If you want to see what is in file read it as binary and look.

data _null_;
  infile "file_path_name" recfm=f lrecl=100 obs=20 encoding='any';
  input;
  list;
run;

 

Autotelic
Obsidian | Level 7
Notepad++ sees as many lines as SAS reports after I run the first of the snippets above: NOTE: 16813 records were read from the infile
However this number doesn't correspond to the number of CRLFs in the file.
I expect to have 16809 observations (as many CRLFs as there are in the file).
Tom
Super User Tom
Super User
If you expect fewer lines than SAS and NOTEPAD+ are reporting then that is an indication that one or more of your character variables have the CRLF characters embedded in their value.
There are many posts on this site on methods to pre-process the file to eliminate the embedded CRLF so that you can get a file with the right number of observations.
Autotelic
Obsidian | Level 7

I'm not sure, but I don't think so, I checked the "extra" lines specifically and they all end with LF and continue in the next line. I also checked and whenever it ends with LF, the data for that line isn't over yet, it carries onto to the next line and when it reaches the end of the data, I find a CRLF.

 

Everything seems to be working fine with these lines. What I'm asking about, as far as I can tell, is unrelated to this since every line the vicinity of the problematic line actually ends with CRLF.

Tom
Super User Tom
Super User

Something else is happening.  If your file is really in UTF-16 then perhaps it is somehow corrupted and it is confusing SAS?

If you look at the top of the file does it look like a UTF-16 file?

For example here is a simple program to create two lines each into a "normal" and a UTF-16 file.

filename temp  temp ;
filename utf16 temp ;
data _null_;
  set sashelp.class (obs=2) ;
  file temp dsd termstr=crlf ;
  put (_all_) (+0);
  file utf16 dsd termstr=crlf encoding='utf-16' ;
  put (_all_) (+0);
run;

If I then look at the hex codes in those files you can see that UTF-16 files take two bytes per character.

227  data _null_;
228    infile temp recfm=f lrecl=100 encoding='any';
229    input;
230    list;
231  run;

NOTE: The infile TEMP is:
      File Size (bytes)=42

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9

1   CHAR  Alfred,M,14,69,112.5..Alice,F,13,56.5,84.. 42
    ZONE  466766242332332333230046666242332332323300
    NUMR  1C6254CDC14C69C112E5DA1C935C6C13C56E5C84DA
NOTE: 1 record was read from the infile TEMP.
NOTE: DATA statement used (Total process time):
      real time           0.17 seconds
      cpu time            0.00 seconds


232
233  data _null_;
234    infile utf16 recfm=f lrecl=100 encoding='any';
235    input;
236    list;
237  run;

NOTE: The infile UTF16 is:
      File Size (bytes)=86

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9

1   CHAR  ÿþA.l.f.r.e.d.,.M.,.1.4.,.6.9.,.1.1.2...5.....A.l.i.c.e.,.F.,.1.3.,.5.6...5.,.8.4..... 86
    ZONE  FF406060706060204020303020303020303030203000004060606060204020303020303020302030300000
    NUMR  FE10C060205040C0D0C01040C06090C0101020E050D0A010C0903050C060C01030C05060E050C08040D0A0
NOTE: 1 record was read from the infile UTF16.
NOTE: DATA statement used (Total process time):
      real time           0.39 seconds
      cpu time            0.01 seconds

Perhaps something has edited the file that did not understand it was UTF-16 and has inserted invalid 2-byte sequence that is confusing SAS?

Autotelic
Obsidian | Level 7

No idea. I'm way out of my element here, trying to convey as much useful information as I can.
When I import the file manually, SAS is able to do it correctly, but it seems to preprocess the file first as the file in the infile statement is a .txt and it uses the enconding WLATIN1. If I try to import using this encoding I get an error and it says that 'A byte-order mark indicates that the data is encoded in "utf-16le"'.

How would I go about finding that 2-byte character that might be in the file? I can't find anything out of the ordinary when I open the file in Notepad++, nor on Notepad for that matter.

Plus, when I import the file in pandas (with the encoding UTF-16), it imports it fine as well.

I'm at a loss.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 38 replies
  • 5945 views
  • 6 likes
  • 7 in conversation