BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anotherdream
Quartz | Level 8

Hello everyone. My company has a tool that exporst data to csv all in one column, but comma seperated. The output file looks exactly like the attached file.  However when I try to read this into sas, every single character is separted by a space. It almost looks like there is an invisible space and/or tab between every visible character.


I also notice that if I open the file and try to "save as" the encoding on it is "Unicode".  I don't know very much about encoding but could this be causing the problem?  I tried to read in the file with the option encoding="unicode" but that did not seem to fix the problem.

I need to build a process that opens and reads from this csv file every night, and this is the only way the application can output data, so this is the only format I can get it in.

I wrote a vba macro that would open the file  and save it as an xls, and then save it again as a csv and that seems to fix the problems, however this is extremely inefficient and i'd like to know how to do this within sas itself.

Please note the following code can be run to show the file with all of the spaces.

proc import datafile="c:\urderive\example plz help.csv" out=whynotworking

replace;

run;

Plz note the dataset should have 4 columns, 1 row of data if the import worked correctly, but instead it has 5 columns and 3 rows, and all of the data has spaces between their values. Example "JobJournal" shows up as "J o b J o u r n a l" in the sas dataset.  I actually can't even paste in the sas values because when I copy and paste it to anything it shows up as blank values.

Please let me know if you need any further information to help with this problem.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Try this. encoding="utf-16le"

Using SAS 9.4 for winders I ran this and SAS told me.  I haven't tried it in any other version of SAS..

20   data _null_;
21       infile FT66F001;
22       input;
23       list;
24       run;

NOTE:
A byte-order mark in the file "C:\Users\zzzzzzzzz\Downloads\example plz help.csv\example
      plz help.csv"
(for fileref "FT66F001") indicates that the data is encoded in "utf-16le".
      This encoding will be used to process the
file.
NOTE: The infile FT66F001 is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----
1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
1         SourceTable_Value,JournalBatchName_Value,GlobalDimension1Code_Value,MasterJobDimension
     
87  _Value 92
2         JobJournalLine,AES-TIME,"clientwithcommainname, N.A.",J03312 60
3          0
NOTE:
3 records were read from the infile (system-specific pathname).
      The minimum record length was
0.
      The maximum record length was
92.
NOTE: DATA statement used (Total process time):
      real time          
0.04 seconds
      cpu time           
0.03 seconds

Message was edited by: data _null_

View solution in original post

10 REPLIES 10
Tim_SAS
Barite | Level 11

The file appears to be in UTF-16. Try this:

filename foo 'c:\temp\example plz help.csv' encoding="utf-16" lrecl=92;

proc import out=test datafile=foo dbms=csv;

getnames=no;

run;

proc print;run;

See Usage Note 16355: Does SAS support UTF-16 encoding?

Anotherdream
Quartz | Level 8

Hiya Tim. I will try this code. Cna I ask how you knew the file was in utf-16 form? The only way I guessed it was 'unicode' (incorrectly apparently) is because that's what it says when I open it in Notepad.

Is theresome special tool you used to figure that out?  I will keep you posted on if this code works!

data_null__
Jade | Level 19

Try this. encoding="utf-16le"

Using SAS 9.4 for winders I ran this and SAS told me.  I haven't tried it in any other version of SAS..

20   data _null_;
21       infile FT66F001;
22       input;
23       list;
24       run;

NOTE:
A byte-order mark in the file "C:\Users\zzzzzzzzz\Downloads\example plz help.csv\example
      plz help.csv"
(for fileref "FT66F001") indicates that the data is encoded in "utf-16le".
      This encoding will be used to process the
file.
NOTE: The infile FT66F001 is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----
1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
1         SourceTable_Value,JournalBatchName_Value,GlobalDimension1Code_Value,MasterJobDimension
     
87  _Value 92
2         JobJournalLine,AES-TIME,"clientwithcommainname, N.A.",J03312 60
3          0
NOTE:
3 records were read from the infile (system-specific pathname).
      The minimum record length was
0.
      The maximum record length was
92.
NOTE: DATA statement used (Total process time):
      real time          
0.04 seconds
      cpu time           
0.03 seconds

Message was edited by: data _null_

Anotherdream
Quartz | Level 8

So Tim your solution did not seem to work, but it is on exactly the right track.  _null_ this worked perfectly and the file is reading in correctly now.

Can you please tell me how you figured out what encoding the file was? Or at least point me to a paper that I can read to figure this out. I would have never in a million years figured that out.

data_null__
Jade | Level 19

Did you see my updated post?  That data step INPUT; LIST; is always where I begin with files that give me problems.

Anotherdream
Quartz | Level 8

Do i need a specific version of sas to use the list option? When I use it I don't get any of the information that you get before the following section.

SourceTable_Value,JournalBatchName_Value,GlobalDimension1Code_Value,MasterJobDimension

      87  _Value 92


2         JobJournalLine,AES-TIME,"clientwithcommainname, N.A.",J03312

data_null__
Jade | Level 19

This is the complete program.   Your path will vary.

filename FT66F001 "C:\Users\zzz\Downloads\example plz help.csv\example plz help.csv";
data _null_;
  
infile FT66F001;
  
input;
  
list;
  
run;
data_null__
Jade | Level 19

This was done using SAS 9.4 and I don't know if that info will be there with other versions of SAS.

jakarman
Barite | Level 11

If you use notepad++ you will see it will recognize the encoding also. There is a tab view for that.

Would you use the type command in a dos box you will see several strange chars at the beginning. Sas will also show them dumping the input line.

That's is the bomb marker, only an indication for Unicode usage.

There are several versions Unicode utf8 is the most recent one. It supports variable number of bytes for 1 character.

This one is  common used in office and Internet.

The old Latin1 is almost identical to the first 250 chars of utf8.

Utf16 is older and different as always 2 bytes are representing one char. There are however many variations in Utf16 as code page differences.

Do you at some moment char space char space (in reality 00x) you most likely are seeing an Utf16 dataset.

As your sas session probably is using Latin1 you can loose chars. There will be a note when it happens.

---->-- ja karman --<-----
jakarman
Barite | Level 11

Data null that is a nice encoding message with 9.4.

Recently I tried something like that with ue that is 9.4 but did not recognize the bom.

There are several other improvements wit 9.4 and the utf support. The proc cimport is still failing from Latin1 to utf8.

---->-- ja karman --<-----

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 7458 views
  • 7 likes
  • 4 in conversation