How to read a file with French accent words in SAS unicode version

Reply
New Contributor
Posts: 2

How to read a file with French accent words in SAS unicode version

I tried to read a file with french accent words using sas unicode version. Some of the values are not read correctly. especially the  the data row  for August is not read correctly. Please help.

****** Data in my text  file (CA_French_Month.txt) ************

janv|janvier|January

févr|février|February

déc |décembre |December

août | août | August

mars|mars|March

avril|avril|April

mai|mai|May

juin|juin|June

juil|juillet|July

sept|septembre|September

oct|octobre|October

nov|novembre|November

déc |décembre |December

************* My sas code ************

data frch_desc;

length  frc_desc $20 frc_abbr $30 mth_desc $20;

infile "c:\CA_French_Month.txt" dlm= "|" truncover;

input

  frc_abbr $

        frc_desc $

  mth_desc $

  ;

run;

proc print;

run;

**************output  ****************

 


                                     The SAS System           18:04 Monday, April 15, 2013   1

                       Obs    frc_desc     frc_abbr                mth_desc

                         1    janvier      janv                    January
                         2    février      févr                    February
                         3    décembre     déc                     December
                         4                 août | août | August
                         5    mars         mars                    March
                         6    avril        avril                   April
                         7    mai          mai                     May
                         8    juin         juin                    June
                         9    juillet      juil                    July
                        10    septembre    sept                    September
                        11    octobre      oct                     October
                        12    novembre     nov                     November
                        13    décembre     déc                     Decembe

Respected Advisor
Posts: 4,659

Re: How to read a file with French accent words in SAS unicode version

If I cut and paste from your posting, everything reads perfectly (I'm using version 9.3 on Windows 7)

                                   The SAS System   21:10 Monday, April 15, 2013   1

                     Obs    frc_desc     frc_abbr    mth_desc

                       1    janvier       janv       January
                       2    février       févr       February
                       3    décembre      déc        December
                       4    août          août       August
                       5    mars          mars       March
                       6    avril         avril      April
                       7    mai           mai        May
                       8    juin          juin       June
                       9    juillet       juil       July
                      10    septembre     sept       September
                      11    octobre       oct        October
                      12    novembre      nov        November
                      13    décembre      déc        December

If I convert the file from ANSI to UTF-8 with Notepad++ and read it again, I get the message

NOTE: A byte order mark in the file "*********************

      ****************\CA_French_Month.txt" (for fileref "#LN00011") indicates

      that the data is encoded in "utf-8".  This encoding will be used to process

      the file.

and the file reads flawlesly.

Try posting your txt file.

PG
Super Contributor
Posts: 273

Re: How to read a file with French accent words in SAS unicode version

Sassavy,

I tried too of course under a french sas 9.3.2 opened in ut8 mode   in W7 32b  in french language

reading from a text editor (with correct local writing of your french source).

with

infile "d:\CA_French_Month.txt" dlm= "|" truncover  encoding="pcoem850";

2013-04-16 12_06_40-SAS.png

everything inside sas seems to work but is the accented character real utf8?

Ultraedit show me the equivalent of what is visible  in the output sas windows   after a copy

                            1     janvier       janv       January
                            2     fÚvrier      fÚvr      February
                            3     dÚcembre     dÚc       December
                            4     ao¹t         ao¹t      August
                            5     mars          mars       March
                            6     avril         avril      April
                            7     mai           mai        May
                            8     juin          juin       June
                            9     juillet       juil       July
                           10     septembre     sept       September
                           11     octobre       oct        October
                           12     novembre      nov        November
                           13     dÚcembre     dÚc       December

But i encounter a problem similar to yours with august

if i use your code or  this one

infile "d:\CA_French_Month.txt" dlm= "|" truncover  encoding="ansi";

2013-04-16 12_09_56-SAS.png

in ultraedit i see then after a copy:

                       1     janvier      janv                January
                        2     février      févr                February
                        3     décembre     déc                 December
                        4                  août|août|August
                        5     mars         mars                March
                        6     avril        avril               April
                        7     mai          mai                 May
                        8     juin         juin                June
                        9     juillet      juil                July
                       10     septembre    sept                September
                       11     octobre      oct                 October
                       12     novembre     nov                 November
                       13     December     déc|décembre

the original data were  without blanks  like

2013-04-16 12_14_34-D__CA_French_Month.png

New Contributor
Posts: 2

Re: How to read a file with French accent words in SAS unicode version

PGStats,

     I am using sas 9.2 and here is the text file i used. .

I give more details,  This file is in  default (ANSI) encoding. I have read this file and eventually load the french words into Oracle database (11g) . I am still curious why all other rows are read correctly and not August alone.

Any suggstions would help me trmendously.

Thanks

Ask a Question
Discussion stats
  • 3 replies
  • 1528 views
  • 0 likes
  • 3 in conversation