DATA Step, Macro, Functions and more

dobytes problem

Reply
Occasional Contributor
Posts: 12

dobytes problem

[ Edited ]

Hi all,

 

I am trying to extract a dataset to a file. Inside a dataset, the first column contains double bytes. The goal is to extract the first column with fixed length of 60 and second column with fixed length of 10.

The problem is that after the extraction, the first field in the first and second row cannot be aligned to the length of 60. Only the rest of rows with pure English can achieve this.

 

 

Sample code:

 

data booking;
input x :$60. y $ :10.;
datalines;
ルアドhello checked
ルアドレス uncheck
Anderson checked
EmmaWatson checked
BradJames checked
proc print;
run;

 

data _null_;
set booking;
file '/plane_booking.txt'

put
@1 x $60.
@61 y $10.;
run;

 

 

Result of the file:

ルアドhello                                     checked
ルアドレス                                uncheck
Anderson                                                 checked
EmmaWatson                                          checked
BradJames                                              checked

 

 

Ideal result:

ルアドhello                                             checked 
ルアドレス                                              uncheck 
Anderson                                                 checked 
EmmaWatson                                          checked 
BradJames                                              checked

 

Kindly give advice , thank you very much!

 

Super User
Posts: 10,044

Re: double bytes problem

How about this one:

 

 
data booking;
input x :$60. y $ :10.;
datalines;
梵蒂冈hello checked
苏丹复苏  uncheck
Anderson checked
EmmaWatson checked
BradJames checked
proc print;
run;
 
data _null_;
set booking;
file 'c:\temp\plane_booking.txt';
len=80;
want=x||y;
put want $varying80. len;
run;
Occasional Contributor
Posts: 12

Re: double bytes problem

Hi Ksharp, thanks for the reply but the result is the same.

 

 

Super User
Super User
Posts: 7,076

Re: double bytes problem

[ Edited ]

Are you asking to create a text file with Unicode characters that can take between 1 to 4 bytes each and still be able to read the second column starting at byte number 61?

 

Why not just create a delimited file instead? Then you do not need to worry about how many bytes or even how many characters are in each field.

 

Occasional Contributor
Posts: 12

Re: double bytes problem

Hi Tom, yes , this is the requirement. 

Super User
Super User
Posts: 7,076

Re: double bytes problem

What happens if byte 61 falls in the middle of a multi-byte character?

Occasional Contributor
Posts: 12

Re: double bytes problem

Hi Tom,

Truncated . 

Super User
Super User
Posts: 7,076

Re: double bytes problem

[ Edited ]

Is SAS putting in too many spaces or not enough?

Either way you should be able to adjust using the difference between the number of characters and the number of bytes in the string.

data test;
  length string $200 ;
  infile cards truncover ;
  input string ;
  Nbytes = length(string);
  Nchars = klength(string);
  Difference = Nbytes - Nchars ;
cards;
ルアドhello
ルアドレス
Anderson
EmmaWatson
BradJames
;
proc print; run;

Capture.JPG

 

So let's output this as fixed length and see what happens. To make it easier to see I will change the spaces in the string to periods and append carets for the extra padding.

data _null_;
  file 'testu8.txt' encoding=utf8;
  length blanks $200 ;
  blanks = repeat('^',199);
  set test ;
  string=ktranslate(string,'.',' ');
  if _n_=1 then put 'NBYTES|NCHARS|DIFF|STRING';
  put nbytes 6. '|' nchars 6. '|' difference 4. '|' @ ;
  put string $15.  blanks $varying15. difference '|' ;
run;

Here is what it looks like in Windows WordPad.

Capture.PNG

To avoid chopping a character in the middle use KSUBSTR() to limit the number of characters until the number of bytes is less than your output field width.

data fix;
 set test;
 do nchars=nchars to 1 by -1 until(nbytes <= 13) ;
   string2 = ksubstr(string,1,nchars);
   nbytes = length(string2);
   put string= string2= nchars= nbytes= ;
 end;
run;

 

Super User
Posts: 10,044

Re: double bytes problem

That is really weird. I got this.
Check attachment.



x.png
Ask a Question
Discussion stats
  • 8 replies
  • 318 views
  • 0 likes
  • 3 in conversation