Reading in a text file that includes nonstandard characters

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 97
Accepted Solution

Reading in a text file that includes nonstandard characters

I am read in a file daily from our oracle system that frequently contains nonstandard characters.  Today's file, when opened in notepad, has one character that looks like an upsidedown question mark and another that is an arrow.  When I open the file in a Hex Editor, the characters are 1A and BF.  When SAS reads in this file, it stops at the first one of these characters that it encounters, but the worst part is that it doesn't issue an error.  I only know that the file didn't read completely because the results are smaller than the original file.  Then I have to go find the character and remove it, which isn't easy.  Can anyone suggest a way to deal with this?  Can SAS read in the file and not stop on that character?  Can it identify and replace it with a blank?

This is an example   .

This is another example ¿ .

Thanks,

Chris

Christopher Johnson
www.codeitmagazine.com
Attachment

Accepted Solutions
Solution
‎10-23-2014 11:42 AM
Occasional Contributor
Posts: 12

Re: Reading in a text file that includes nonstandard characters

Hi Chris,

Try the IGNOREDOSEOF option of the infile statement...

View solution in original post


All Replies
Respected Advisor
Posts: 3,785

Re: Reading in a text file that includes nonstandard characters

Frequent Contributor
Posts: 97

Re: Reading in a text file that includes nonstandard characters

The problem is that this should not be a binary file.  It is a text file produced by an export from an Oracle table.  However, the application that enters the data into Oracle doesn't stop the user from pasting in values into the text box that are not strictly text.  This is how we believe these nonstandard characters get into the file.  I can't find a way to identify and remove them in the text file or while reading into SAS.

I have never read in a binary file into SAS.  Would it be possible to read it in as binary and then identify and eliminate the character?

Christopher Johnson
www.codeitmagazine.com
Respected Advisor
Posts: 3,785

Re: Reading in a text file that includes nonstandard characters

If you want to read past '1a'x use the option if you don't don't use it.

Valued Guide
Posts: 3,208

Re: Reading in a text file that includes nonstandard characters

Use recfm=n for binary reading the file it will read all of them. But interesting is what is the cause.  Are you sure it is not an encoding issue (utf-8).

The character 1A is valid  there as if BF. You could try for example using notepad++ 

The note of data _null_ would tell it would have been fixed in 8.2 you are running 6. The convention was common in the old days. Getting it back would be going back in time.

---->-- ja karman --<-----
Frequent Contributor
Posts: 97

Re: Reading in a text file that includes nonstandard characters

Thanks!  I will try that.

Christopher Johnson
www.codeitmagazine.com
Solution
‎10-23-2014 11:42 AM
Occasional Contributor
Posts: 12

Re: Reading in a text file that includes nonstandard characters

Hi Chris,

Try the IGNOREDOSEOF option of the infile statement...

Frequent Contributor
Posts: 97

Re: Reading in a text file that includes nonstandard characters

That is fantastic!   Amazing that there was a SAS option specifically to deal with my problem.  Thanks very much!

Christopher Johnson
www.codeitmagazine.com
Valued Guide
Posts: 3,208

Re: Reading in a text file that includes nonstandard characters

- SAS(R) 9.4 Companion for Windows, Third Edition it is a Windows specific/dedicated one.

- Not being the reason why Oracle would generate that kind of stuff. 

---->-- ja karman --<-----
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 901 views
  • 0 likes
  • 4 in conversation