Learning SAS? Welcome to the exclusive online community for all SAS learners.

How to deal column input data with SAS UE (if you're French...)

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

How to deal column input data with SAS UE (if you're French...)

Hi to all

 

I've been teaching SAS for a long (long) time and I have to prepare a course for students who will only have SAS UE.

 

When you teach SAS, you have to talk about column input data but I'm having a problem and I'd like to know how to fix it...

 

See this example:

 

data test ;
input french $ 1-10 english 12-23;
cards;
l'éléphant the elephant
les élèves the students
le guépard the cheetah
l'orange   the orange
;

You'll have no problem with this program if your encoding is wlatin1 but since encoding with SAS UE is UTF-8, what you have between column 1 and 10 for the first record if not "l'élephant" but only "l'élépha".

 

in fact, it looks like column input data but if encoding is UTF-8, it's not column input data.

 

I introduce an informat statement prior to the input statement (informat french $14.) : it does not work.

I put my raw data in a txt file and use the encoding option (wlatin1) of the infile statement: it does not work.

 

is there a solution?

(but, please, do not ask me to use examples "compatible" with the UTF-8 encoding: I don't want to adapt my examples to SAS, I'd like SAS to adapt to my examples)

 

best regards

 

Sébastien

 


Accepted Solutions
Solution
2 weeks ago
Super User
Super User
Posts: 6,849

Re: How to deal column input data with SAS UE (if you're French...)

[ Edited ]

I am not sure if there is a better way but if you are sure the input only contains characters that can be transcoded into WLATIN1 then perhaps you could use KCVT() function?

data test ;
  length french $20 english $20 ;
  input;
  french=kcvt(substrn(kcvt(_infile_,'utf-8','wlatin1'),1,10),'wlatin1','utf-8');
  english=kcvt(substrn(kcvt(_infile_,'utf-8','wlatin1'),12,12),'wlatin1','utf-8');
cards;
l'éléphant the elephant
les élèves the students
le guépard the cheetah
l'orange   the orange
;

Or KSUBSTR()?

data test ;
  length french $20 english $20 ;
  input @;
  french=ksubstr(_infile_,1,10);
  english=ksubstr(_infile_,12,12);
  input;
cards;
l'éléphant the elephant
les élèves the students
le guépard the cheetah
l'orange   the orange
;

 

View solution in original post


All Replies
SAS Employee
Posts: 6

Re: How to deal column input data with SAS UE (if you're French...)

Try this: 

 

data test ;
input french $ 1-12 english $13-26;
cards;
l'éléphant  the elephant
les élèves  the students
le guépard  the cheetah
l'orange    the orange
;

 

I think when using characters with an accent, such as é or è, they count as 2 characters in SAS Rather than one. The word l'éléphant then counts as 12 characters rather than 10, 4 being the accented é and è, the other 8 being the other non-accented characters. 

 

I added two spaces in between the data to make everything even, because without the extra space the "t" in "the orange" would have been cut off, while the rest of the english words would have remained intact. Unfortunately I think your best bet might be to format the data in a .csv document or an excel document and then import it into SAS.

 

J'espère que je vous avez aidez! Si vous avez des questions ou des préoccupations, faites-le moi savoir!

 

Daniel DuVal

Highlighted
Contributor
Posts: 43

Re: How to deal column input data with SAS UE (if you're French...)

Hello Daniel

 

and thanks for your answer / merci pour cette réponse ;-)

 

Yes, it's always possible to modify the organization of the data but I'm looking for a "pure SAS" solution that implies no modification of the organization of the raw data (some options I've never heard of, in fact).

 

If there is no such options, well, the only possible conclusion is: with UTF-8 encoding, SAS can't deal column input data if the fields contain characters that need more than one octet to be encoded.

 

best regards

 

Sébastien

Contributor
Posts: 43

Re: How to deal column input data with SAS UE (if you're French...)

I'm still looking for a solution...

 

if this solution exists, I'm quite sure it will solve this other problem:

data test;
input amount :euro.;
cards;
€123.45
;

no results...(see the log) in UTF-8, the euro sign is encoded in three octets and the EUROw.d  informat no longer recognizes the euro sign.

 

Since we can't modify the value of the ENCODED global option (you need to access to the SASV9.CFG file which is not possible with SAS UE), is there a another solution to ask SAS to use a latin1 encoded input buffer?

Super User
Super User
Posts: 6,849

Re: How to deal column input data with SAS UE (if you're French...)

You could keep your example, but change the lesson from how to read data files with fixed byte lengths to a lesson on the impact of UTF8 encoding on the old assumptions of how computers store text.

 

Your example could be modified to work fine with list mode input if you used a different delimiter or made sure to use double spaces between items (and no double spaces within items).

Solution
2 weeks ago
Super User
Super User
Posts: 6,849

Re: How to deal column input data with SAS UE (if you're French...)

[ Edited ]

I am not sure if there is a better way but if you are sure the input only contains characters that can be transcoded into WLATIN1 then perhaps you could use KCVT() function?

data test ;
  length french $20 english $20 ;
  input;
  french=kcvt(substrn(kcvt(_infile_,'utf-8','wlatin1'),1,10),'wlatin1','utf-8');
  english=kcvt(substrn(kcvt(_infile_,'utf-8','wlatin1'),12,12),'wlatin1','utf-8');
cards;
l'éléphant the elephant
les élèves the students
le guépard the cheetah
l'orange   the orange
;

Or KSUBSTR()?

data test ;
  length french $20 english $20 ;
  input @;
  french=ksubstr(_infile_,1,10);
  english=ksubstr(_infile_,12,12);
  input;
cards;
l'éléphant the elephant
les élèves the students
le guépard the cheetah
l'orange   the orange
;

 

Contributor
Posts: 43

Re: How to deal column input data with SAS UE (if you're French...)

Hi Tom

 

and many thanks for your posts.

 

yes.. I'll have to change my lesson... but I'm afraid I'll lost my students who never heard of SAS, if I talk, the day 1 of the course, about encoding problems... I'll miss the old world, one byte=one character ;-)

 

The KSUBSTR solution is splendid and quite simple! To me, this is the way to deal column input data if you have "special" characters...

 

Best regards

 

Sébastien

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 229 views
  • 0 likes
  • 3 in conversation