Re: double bytes problem

Olime · Posted 12-03-2016 06:05 AM

Hi all,

I am trying to extract a dataset to a file. Inside a dataset, the first column contains double bytes. The goal is to extract the first column with fixed length of 60 and second column with fixed length of 10.

The problem is that after the extraction, the first field in the first and second row cannot be aligned to the length of 60. Only the rest of rows with pure English can achieve this.

Sample code:

data booking;
input x :$60. y $ :10.;
datalines;
ルアドhello checked
ルアドレス uncheck
Anderson checked
EmmaWatson checked
BradJames checked
proc print;
run;

data _null_;
set booking;
file '/plane_booking.txt'

put
@1 x $60.
@61 y $10.;
run;

Result of the file:

ルアドhello checked
ルアドレス uncheck
Anderson checked
EmmaWatson checked
BradJames checked

Ideal result:

ルアドhello checked
ルアドレス uncheck
Anderson checked
EmmaWatson checked
BradJames checked

Kindly give advice , thank you very much!

Ksharp · Posted 12-03-2016 07:32 AM

How about this one:

 
data booking;
input x :$60. y $ :10.;
datalines;
梵蒂冈hello checked
苏丹复苏  uncheck
Anderson checked
EmmaWatson checked
BradJames checked
proc print;
run;
 
data _null_;
set booking;
file 'c:\temp\plane_booking.txt';
len=80;
want=x||y;
put want $varying80. len;
run;

Olime · Posted 12-03-2016 10:23 AM

Hi Ksharp, thanks for the reply but the result is the same.

Tom · Posted 12-03-2016 12:32 PM

Are you asking to create a text file with Unicode characters that can take between 1 to 4 bytes each and still be able to read the second column starting at byte number 61?

Why not just create a delimited file instead? Then you do not need to worry about how many bytes or even how many characters are in each field.

Olime · Posted 12-03-2016 12:59 PM

Hi Tom, yes , this is the requirement.

Tom · Posted 12-03-2016 01:08 PM

What happens if byte 61 falls in the middle of a multi-byte character?

Olime · Posted 12-03-2016 01:21 PM

Hi Tom,

Truncated .

Tom · Posted 12-03-2016 01:51 PM

Is SAS putting in too many spaces or not enough?

Either way you should be able to adjust using the difference between the number of characters and the number of bytes in the string.

data test;
  length string $200 ;
  infile cards truncover ;
  input string ;
  Nbytes = length(string);
  Nchars = klength(string);
  Difference = Nbytes - Nchars ;
cards;
ルアドhello
ルアドレス
Anderson
EmmaWatson
BradJames
;
proc print; run;

So let's output this as fixed length and see what happens. To make it easier to see I will change the spaces in the string to periods and append carets for the extra padding.

data _null_;
  file 'testu8.txt' encoding=utf8;
  length blanks $200 ;
  blanks = repeat('^',199);
  set test ;
  string=ktranslate(string,'.',' ');
  if _n_=1 then put 'NBYTES|NCHARS|DIFF|STRING';
  put nbytes 6. '|' nchars 6. '|' difference 4. '|' @ ;
  put string $15.  blanks $varying15. difference '|' ;
run;

Here is what it looks like in Windows WordPad.

To avoid chopping a character in the middle use KSUBSTR() to limit the number of characters until the number of bytes is less than your output field width.

data fix;
 set test;
 do nchars=nchars to 1 by -1 until(nbytes <= 13) ;
   string2 = ksubstr(string,1,nchars);
   nbytes = length(string2);
   put string= string2= nchars= nbytes= ;
 end;
run;

Ksharp · Posted 12-03-2016 09:52 PM

That is really weird. I got this.
Check attachment.

Registration is open

SAS Training: Just a Click Away