BookmarkSubscribeRSS Feed
Filipvdr
Pyrite | Level 9

Importing CSV Files with charachters like "≤" ,"m�",  . Which options / encoding to use?

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Encodig UTF8 or UTF16 should be able to cope with those characters.

SuryaKiran
Meteorite | Level 14

What happen's when you tried to import the file in SAS. Did you get a warning?

WARNING: Some character data was lost during transcoding in column

Your SAS session might be running in wlatin or latin1 encoding. You may need to run SAS with UTF-8 encoding. Changing the encoding might require help from SAS admins, require to change in configuration file. It all depends on your environment setup. If your running on a server environment then it is required to setup the encoding at the start of the session. 

 

Here is an example of explicitly setting the Base SAS (not SPD Server) session encoding with the OPTIONS statement:

options encoding="utf-8"; 

 

In PC SAS, open the utf-8 application. You might find it under SAS>Additional Languages in Windows.

 

Thanks,
Suryakiran
Filipvdr
Pyrite | Level 9
data work."W1FTF7X"n / view = work."W1FTF7X"n ;
 infile '/SASDATA/DataResult/ETL_InputFiles/AA/TT.csv'
 lrecl = 428
delimiter = ';'
dsd
missover
firstobs = 2 encoding="utf-8";
;



I used above code but the characters dissapear now, or are translated by a space.

 

WARNING: A character that could not be transcoded has been replaced in record 3.
WARNING: A character that could not be transcoded has been replaced in record 4.
WARNING: A character that could not be transcoded has been replaced in record 5.
WARNING: A character that could not be transcoded has been replaced in record 6.
WARNING: A character that could not be transcoded has been replaced in record 7.
WARNING: A character that could not be transcoded has been replaced in record 8.
WARNING: A character that could not be transcoded has been replaced in record 9.
WARNING: A character that could not be transcoded has been replaced in record 10.
WARNING: A character that could not be transcoded has been replaced in record 11.
WARNING: A character that could not be transcoded has been replaced in record 12.
WARNING: A character that could not be transcoded has been replaced in record 13.
WARNING: A character that could not be transcoded has been replaced in record 14.
WARNING: A character that could not be transcoded has been replaced in record 15.
WARNING: A character that could not be transcoded has been replaced in record 16.
WARNING: A character that could not be transcoded has been replaced in record 17.
WARNING: A character that could not be transcoded has been replaced in record 18.
WARNING: A character that could not be transcoded has been replaced in record 19.
WARNING: A character that could not be transcoded has been replaced in record 20.
WARNING: A character that could not be transcoded has been replaced in record 21.
WARNING: A character that could not be transcoded has been replaced in record 22.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, there are some odd things going on with your code here.  First off, why are you using numeric literals and creating a view as well?

data work."W1FTF7X"n / view = work."W1FTF7X"n ;

That looks very peculier to me.  It should just be:

data work.w1ftf7x;

This leads on to my second question, you state you are using a CSV file - that is a Comma Separated Variable file, however you are specifying a delimiter of ; which is not a comma.  Therefore, is the file an actual CSV, or is it a delimited file?  Or is it in fact an Excel file, which the named literals tends to imply.

 

Finally, encoding needs to be set for the environment, not just the step.  Your environment is set at startup, so you might need your IT group to provide a UTF8 encoding SAS session to be able to work with the data.

 

Filipvdr
Pyrite | Level 9

i'm using DI studio and a file reader. But i changed the file reader to user written body so i can add the encoding option. This explains the view statement.

I'm using a .csv file but it's delimiter is a ";"

This can give a lot problems if we change it for our whole environment. I'm looking for a solution to only use it to read in this .csv (or delimited file)

EDIT: this should not be in the "new user" section.. don't know how it got here

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, the view might be a DI thing, never used it, but it does look very odd to use named literals.

CSV is an international standard for Comma Separated Variable data:

https://en.wikipedia.org/wiki/Comma-separated_values

If you use a Semicolon in the data, it is not a CSV, it is a delimited file.  Calling something an identifier which it is not is very confusing.

 

Anyways, thats a side issue.  If you want to be able to use characters which are not in the character set then you will need a system with the encoding setup correctly.  Changing it for the step will read the data, however the system still does not understand that data.  

Filipvdr
Pyrite | Level 9
thanks RW9, that is a clear answer.. so there are no ways to read this data on a system which uses latin1?
SuryaKiran
Meteorite | Level 14

I might suggest a way that might work, but be caution since this will bypass the default configuration file.

 

1) Find the configuration file for utf-8 (sasv9.cfg), usually in "/opt/sas/sashome/prod/SASFoundation/9.4/nls/u8/sasv9.cfg" and copy it to your user folder(/user/name/sasv9.cfg). Make sure the configuration file has all the necessary setup information. 

2) open .profile and add the environment variable. If multiple node then add in all of the profiles.

export SASV9_OPTIONS='-config /user/name/sasv9.cfg'

3) Researt SAS and see where the configuration file is bypassed or not.

proc options option=config;
run;

Your admins might not like you to do this.

This method worked for me to increase memsize from the default 2G to 6G in my Unix environment. 

Hope this helps.

 

 

 

Thanks,
Suryakiran
RW9
Diamond | Level 26 RW9
Diamond | Level 26

wlatin1 doesn't support those characters, so even if you load the data, you wouldn't be able to work with them.  Just fire up a separate UTF8 session and do it in there, think most should have moved onto UTF8 by now anyways?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1871 views
  • 0 likes
  • 3 in conversation