DATA Step, Macro, Functions and more

INPUT statement treatment of "accented" ASCII chars

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

INPUT statement treatment of "accented" ASCII chars

Reading a flat text file in ASCII, fixed length records, fixed-width fields. One field is department name and the word "Café" appears in it. On the records where this happens, INPUT pointer jumps backwards 1 byte and therefore my @xx specs are all off for the remainder of the record and the data gets hosed.

Is there an INFILE option or parm that I can use to keep everything lined up?

thanks

Doug


Accepted Solutions
Solution
‎02-18-2014 03:41 AM
Respected Advisor
Posts: 4,173

Re: INPUT statement treatment of "accented" ASCII chars

Are you sure your text file is ASCII and not ANSI or UTF? A good way to check is opening the file with Notepad++ and there check in the encoding menu what it tells you.

Not sure why your pointer would be "jumping backwards" but I sure had already my fun with the "é". There is an "encoding" option available for the infile statement which you should set to the value of your source file.

View solution in original post


All Replies
Solution
‎02-18-2014 03:41 AM
Respected Advisor
Posts: 4,173

Re: INPUT statement treatment of "accented" ASCII chars

Are you sure your text file is ASCII and not ANSI or UTF? A good way to check is opening the file with Notepad++ and there check in the encoding menu what it tells you.

Not sure why your pointer would be "jumping backwards" but I sure had already my fun with the "é". There is an "encoding" option available for the infile statement which you should set to the value of your source file.

Occasional Contributor
Posts: 9

Re: INPUT statement treatment of "accented" ASCII chars

To be more specific, the pointer isn't "jumping backwards". I'm trying to read the field at position 100 (@100) for example, but when there is "Café" in the preceding field, position 100 is blank and the data is actually at 101.

I always assume I'm dealing with ASCII when reading text files on a PC, and the file is sourced from USA. So I downloaded Notepad++, found the encoding was UTF-8, put that option on the INFILE, and it works fine now!

Thanks for the tip!

Doug

Respected Advisor
Posts: 4,173

Re: INPUT statement treatment of "accented" ASCII chars

The encoding of the little e with accent uses obviously 2 bytes but you're only reading the first byte with ASCII (and then you get one byte "behind").

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 269 views
  • 1 like
  • 2 in conversation