Help reading in File, might be encoding problem?

Accepted Solution Solved
Reply
Super Contributor
Posts: 418
Accepted Solution

Help reading in File, might be encoding problem?

Hello everyone. My company has a tool that exporst data to csv all in one column, but comma seperated. The output file looks exactly like the attached file.  However when I try to read this into sas, every single character is separted by a space. It almost looks like there is an invisible space and/or tab between every visible character.


I also notice that if I open the file and try to "save as" the encoding on it is "Unicode".  I don't know very much about encoding but could this be causing the problem?  I tried to read in the file with the option encoding="unicode" but that did not seem to fix the problem.

I need to build a process that opens and reads from this csv file every night, and this is the only way the application can output data, so this is the only format I can get it in.

I wrote a vba macro that would open the file  and save it as an xls, and then save it again as a csv and that seems to fix the problems, however this is extremely inefficient and i'd like to know how to do this within sas itself.

Please note the following code can be run to show the file with all of the spaces.

proc import datafile="c:\urderive\example plz help.csv" out=whynotworking

replace;

run;

Plz note the dataset should have 4 columns, 1 row of data if the import worked correctly, but instead it has 5 columns and 3 rows, and all of the data has spaces between their values. Example "JobJournal" shows up as "J o b J o u r n a l" in the sas dataset.  I actually can't even paste in the sas values because when I copy and paste it to anything it shows up as blank values.

Please let me know if you need any further information to help with this problem.

Attachment

Accepted Solutions
Solution
‎10-01-2014 03:44 PM
Respected Advisor
Posts: 3,786

Re: Help reading in File, might be encoding problem?

Try this. encoding="utf-16le"

Using SAS 9.4 for winders I ran this and SAS told me.  I haven't tried it in any other version of SAS..

20   data _null_;
21       infile FT66F001;
22       input;
23       list;
24       run;

NOTE:
A byte-order mark in the file "C:\Users\zzzzzzzzz\Downloads\example plz help.csv\example
      plz help.csv"
(for fileref "FT66F001") indicates that the data is encoded in "utf-16le".
      This encoding will be used to process the
file.
NOTE: The infile FT66F001 is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----
1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
1         SourceTable_Value,JournalBatchName_Value,GlobalDimension1Code_Value,MasterJobDimension
     
87  _Value 92
2         JobJournalLine,AES-TIME,"clientwithcommainname, N.A.",J03312 60
3          0
NOTE:
3 records were read from the infile (system-specific pathname).
      The minimum record length was
0.
      The maximum record length was
92.
NOTE: DATA statement used (Total process time):
      real time          
0.04 seconds
      cpu time           
0.03 seconds

Message was edited by: data _null_

View solution in original post


All Replies
Super Contributor
Posts: 394

Re: Help reading in File, might be encoding problem?

The file appears to be in UTF-16. Try this:

filename foo 'c:\temp\example plz help.csv' encoding="utf-16" lrecl=92;

proc import out=test datafile=foo dbms=csv;

getnames=no;

run;

proc print;run;

See Usage Note 16355: Does SAS support UTF-16 encoding?

Super Contributor
Posts: 418

Re: Help reading in File, might be encoding problem?

Hiya Tim. I will try this code. Cna I ask how you knew the file was in utf-16 form? The only way I guessed it was 'unicode' (incorrectly apparently) is because that's what it says when I open it in Notepad.

Is theresome special tool you used to figure that out?  I will keep you posted on if this code works!

Solution
‎10-01-2014 03:44 PM
Respected Advisor
Posts: 3,786

Re: Help reading in File, might be encoding problem?

Try this. encoding="utf-16le"

Using SAS 9.4 for winders I ran this and SAS told me.  I haven't tried it in any other version of SAS..

20   data _null_;
21       infile FT66F001;
22       input;
23       list;
24       run;

NOTE:
A byte-order mark in the file "C:\Users\zzzzzzzzz\Downloads\example plz help.csv\example
      plz help.csv"
(for fileref "FT66F001") indicates that the data is encoded in "utf-16le".
      This encoding will be used to process the
file.
NOTE: The infile FT66F001 is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----
1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
1         SourceTable_Value,JournalBatchName_Value,GlobalDimension1Code_Value,MasterJobDimension
     
87  _Value 92
2         JobJournalLine,AES-TIME,"clientwithcommainname, N.A.",J03312 60
3          0
NOTE:
3 records were read from the infile (system-specific pathname).
      The minimum record length was
0.
      The maximum record length was
92.
NOTE: DATA statement used (Total process time):
      real time          
0.04 seconds
      cpu time           
0.03 seconds

Message was edited by: data _null_

Super Contributor
Posts: 418

Re: Help reading in File, might be encoding problem?

So Tim your solution did not seem to work, but it is on exactly the right track.  _null_ this worked perfectly and the file is reading in correctly now.

Can you please tell me how you figured out what encoding the file was? Or at least point me to a paper that I can read to figure this out. I would have never in a million years figured that out.

Respected Advisor
Posts: 3,786

Re: Help reading in File, might be encoding problem?

Did you see my updated post?  That data step INPUT; LIST; is always where I begin with files that give me problems.

Super Contributor
Posts: 418

Re: Help reading in File, might be encoding problem?

Do i need a specific version of sas to use the list option? When I use it I don't get any of the information that you get before the following section.

SourceTable_Value,JournalBatchName_Value,GlobalDimension1Code_Value,MasterJobDimension

      87  _Value 92


2         JobJournalLine,AES-TIME,"clientwithcommainname, N.A.",J03312

Respected Advisor
Posts: 3,786

Re: Help reading in File, might be encoding problem?

This is the complete program.   Your path will vary.

filename FT66F001 "C:\Users\zzz\Downloads\example plz help.csv\example plz help.csv";
data _null_;
  
infile FT66F001;
  
input;
  
list;
  
run;
Respected Advisor
Posts: 3,786

Re: Help reading in File, might be encoding problem?

This was done using SAS 9.4 and I don't know if that info will be there with other versions of SAS.

Valued Guide
Posts: 3,208

Re: Help reading in File, might be encoding problem?

If you use notepad++ you will see it will recognize the encoding also. There is a tab view for that.

Would you use the type command in a dos box you will see several strange chars at the beginning. Sas will also show them dumping the input line.

That's is the bomb marker, only an indication for Unicode usage.

There are several versions Unicode utf8 is the most recent one. It supports variable number of bytes for 1 character.

This one is  common used in office and Internet.

The old Latin1 is almost identical to the first 250 chars of utf8.

Utf16 is older and different as always 2 bytes are representing one char. There are however many variations in Utf16 as code page differences.

Do you at some moment char space char space (in reality 00x) you most likely are seeing an Utf16 dataset.

As your sas session probably is using Latin1 you can loose chars. There will be a note when it happens.

---->-- ja karman --<-----
Valued Guide
Posts: 3,208

Re: Help reading in File, might be encoding problem?

Data null that is a nice encoding message with 9.4.

Recently I tried something like that with ue that is 9.4 but did not recognize the bom.

There are several other improvements wit 9.4 and the utf support. The proc cimport is still failing from Latin1 to utf8.

---->-- ja karman --<-----
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 2626 views
  • 7 likes
  • 4 in conversation