BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pdata
Obsidian | Level 7
Hi SAS,
data readocr;
INFILE '\\sample_src_files\infile_read.txt'
        LRECL=32767 
        encoding='utf-16le'
        TERMSTR=CRLF
        DLM='|'
/*        MISSOVER*/
        DSD firstobs=1 ;
    input
           test_id : $char20.
           test_question_id : $char20.
           test_question_description : $char300.
           test_answer_id : $char10.
           test_answer_description : $char200.
           test_response_value : $char4000.
           test_response_type : $char200.
 
;
run;
I have an instance to read the raw data file.
No matter what I try with different encoding options... I still get to see the WARNING: A character that could not be transcoded was encountered.
 
Could you please suggest how to get rid of this warning. 
Sample Data and code are attached.
1 ACCEPTED SOLUTION

Accepted Solutions
pdata
Obsidian | Level 7

I have added this..not sure if its a right way to do... but it worked for me

 

I created a copy of the source file using windows TYPE command... thus the file is of ASCI format.

 

copy the source file using windows TYPE command to CURRENT folder:

FLG holds the 0/1 

0- successful copy

1- error

 

options noxwait;

Data _null_;

flg = system("type &filepath.\&&&f_name&i. > &filepath.\Current\&&&f_name&i.");

put flg = ;

run;

 

 

Data read_src;

infile "&filepath.\Current\&&&f_name&i."

                                 lrecl=2500

                                termster=crlf

                                dlm='|'

                                missover

                                dsd

                                firstobs=2;

 

View solution in original post

14 REPLIES 14
Tom
Super User Tom
Super User

What encoding is your SAS session using?  If you are using a single byte encoding like LATIN1 then there are only 256 possible character codes that SAS can attempt to transcode the characters into.

 

Make sure to run the code with a SAS session that is using UTF-8 encoding.

pdata
Obsidian | Level 7

my SAS Session is using edcoding as "WLATIN1"

 

 

Tom
Super User Tom
Super User
So start SAS in a different way so that you are using an UTF-8 encoding instead.
How are you running SAS? Are you using SAS Display Manager to run SAS interactively? Just running background/batch jobs? Are you using some other interface like Enterprise Guide or SAS/Studio? Depending on how you are running SAS the instructions for how to open a session that is using UTF-8 is different.
pdata
Obsidian | Level 7

I am using SAS Enterprise Guide to execute the code

Tom
Super User Tom
Super User

@pdata wrote:

I am using SAS Enterprise Guide to execute the code


In Enterprise Guide you have to tell it what server you are connecting to.  If your shop has setup SAS properly you should be able to just select a different server to connect to that is using UTF-8 encoding.  Then you will have a better chance of being able to read the file.

pdata
Obsidian | Level 7

I tried with all the possible Encoding options:

 

If I use Encoding = 'UTF-8'

 I have error as follows

 

"A byte order mark indicates that the data is encoded in "utf-16le". This conflicts with the "utf-8" encoding that was specified for the fileref

"#LN00124".

Tom
Super User Tom
Super User

@pdata wrote:

I tried with all the possible Encoding options:

 

If I use Encoding = 'UTF-8'

 I have error as follows

 

"A byte order mark indicates that the data is encoded in "utf-16le". This conflicts with the "utf-8" encoding that was specified for the fileref

"#LN00124".


Changing the encoding that you ask SAS to use when reading the file probably will not help you.  If the file contains a character that cannot be mapped into WLATIN1 (which is a single byte encoding) then you will get that error.

pdata
Obsidian | Level 7

do we have an option to read including special characters etc?

 

My organisation uses only BASE SAS.. I use SAS EG just because I can save my work as project and I can also use EG features.

 

What so ever, though I am using EG... I don't have any servers or Metadata defined.

I am working on a SAS Datawarehouse project... All my target tables are in the form of SAS Datasets and are accessed though Libnames. 

 

What best I can do to get rid of the WARNING:?

 

Is there a way to read the sample record from .txt file that I provided with out Warning?

Tom
Super User Tom
Super User

So if you are running SAS locally then make sure to start it with Unicode support. On my PC there are multiple start menu options to start SAS with different settings.  If you are using EG to run SAS on your PC then you should also be able to do that.

 

You could also just try setting ENCODING=ANY and see what type of gibberish is in the file.  In the 16 bit encoding that the BOM (Byte Order Mark) at the beginning of your file says it is using each character takes two bytes (16 bits).  So read it in two byte chunks and see what are the goofy characters and where are they.

 

Normal 7-bit ASCII codes will always have '00' as the second by in that encoding.  So here is a data step that will show what it is that is confusing SAS.  The first thing it lists is the BOM itself.  So it looks like the code 'FCFF'x is what is causing the trouble.  I don't think that is a valid character in any encoding.

 

58    data _null_;
59     infile "&path/infile_read.txt" recfm=f lrecl=2 encoding=any ;
60     input x $char2.;
61     if substr(x,2) ne '00'x then
62     put _n_= 3. +1 x= $2. +1 x $hex4. ;
63    run;

NOTE: The infile ".../infile_read.txt" is:
      Filename=...\infile_read.txt,
      RECFM=F,LRECL=2,File Size (bytes)=288,
      Last Modified=10Jun2019:11:48:31,
      Create Time=10Jun2019:11:47:31

_N_=1  x=ÿþ  FFFE
_N_=137  x=üÿ  FCFF
NOTE: 144 records were read from the infile ".../infile_read.txt".
pdata
Obsidian | Level 7

Till now, I just got to see "?"  and "->" when I read the file using import wizard... both using BASE SAS and EG.

 

With your code I get to see the same multiple strange special characters.

And also noticed that "1" line of raw datafile is read as 144 records as per the NOTE.

 

Thank you so much Tom, do you have any more suggestions for me.

My requirement is I keep receiving such kind of files with out cleansing.

 

Thanks for all your effect.

Tom
Super User Tom
Super User

Then tell them to not send files with strange characters in them.

 

pdata
Obsidian | Level 7

Hi Tom, 

 

Is there a way or SAS process to convert UTF-16 file format to ASCI or UTF-8 with out changing the "sasv9.cfg" file. 

 

Thanks

Tom
Super User Tom
Super User

@pdata wrote:

Hi Tom, 

 

Is there a way or SAS process to convert UTF-16 file format to ASCI or UTF-8 with out changing the "sasv9.cfg" file. 

 

Thanks


The code I posted using ENCODING=ANY on the INFILE statement will let you read in the double byte characters.  You can then role you own translation if you want.  You could try using the UNICODE() function.

https://documentation.sas.com/?docsetId=nlsref&docsetTarget=p1s61kmqg61m29n109i6dz4ivkbw.htm&docsetV...

 

Not sure what you are going to do with codes like 'FCFF'x. 

pdata
Obsidian | Level 7

I have added this..not sure if its a right way to do... but it worked for me

 

I created a copy of the source file using windows TYPE command... thus the file is of ASCI format.

 

copy the source file using windows TYPE command to CURRENT folder:

FLG holds the 0/1 

0- successful copy

1- error

 

options noxwait;

Data _null_;

flg = system("type &filepath.\&&&f_name&i. > &filepath.\Current\&&&f_name&i.");

put flg = ;

run;

 

 

Data read_src;

infile "&filepath.\Current\&&&f_name&i."

                                 lrecl=2500

                                termster=crlf

                                dlm='|'

                                missover

                                dsd

                                firstobs=2;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1767 views
  • 0 likes
  • 2 in conversation