Solved: Re: Column pointer controls and UTF-8

gabonzo · Posted 11-02-2020 10:55 AM

Hi,

I am trying to use PUT and column pointer controls to output a table in Markdown format.

I have set SAS to use UTF-8; whenever I encounter a string with an accented character, the column pointer control comes up one character short and the row becomes misaligned.

example:

data have;
input id name $;
datalines;
35 Ontario
24 Québec
46 Manitoba
;
run;

data _null_;
	set have;
	put @32 '|'
	@1 '|'
	@3 name
	@12 '|'
	@14 id;
run;

Results in:

| Ontario  | 35                |
| Québec  | 24                |
| Manitoba | 46                |

I get why that happens (accented characters are multibyte) and it's just a minor nuiscance, however I would like to know if there is a UTF-8 safe way to use PUT and column pointers.

Thanks!

Tom · Posted 11-03-2020 11:30 AM

You will need to use relative cursor movement or calculated cursor positions to adjust for the different number of bytes being written.

data have;
  length id 8 name $10 ;
  input id name ;
  offset=length(name)-klength(name);
datalines;
35 Ontario
24 Québec
46 Manitoba
;

data _null_;
  set have;
  put 
    @1 '| ' name
    @12 +offset '| ' id 
    @32 +offset '|'
  ;
run;

 87         data _null_;
 88           set have;
 89           put
 90             @1 '| ' name
 91             @12 +offset '| ' id
 92             @32 +offset '|'
 93           ;
 94         run;
 
 | Ontario  | 35                |
 | Québec   | 24                |
 | Manitoba | 46                |

View solution in original post

Cynthia_sas · Posted 11-03-2020 10:24 AM

Hi:

I'm not encountering your issue. I always find it better to use explicit formats with my PUT and to work from left-to-right when writing my variable values. Here's what I get:

I adjusted the column start positions to account for a longer NAME value, such as Saskatchewan. I also wrote to a TXT file and to the SAS log and did not observe what you describe. If you continue to have issues, I would recommend opening a track with Tech Support.

Cynthia

gabonzo · Posted 11-03-2020 10:45 AM

Hi Cynthia,

I tried your code, and I still have the same issue, both in the log and the output file.

Could you please check what values you got for these options?

DBCSTYPE=UTF8 Specifies the encoding method to use for a double-byte character set.
ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.

Thanks!

Cynthia_sas · Posted 11-03-2020 11:16 AM

Hi:
I actually have DBCSTYPE=NONE and ENCODING=WLATIN1 on my system with 9.4M5. However, I still think you need to open a track with Tech Support on this. Issues with encoding are typically something that needs to be figured out because some of the options can only be set at start-up time.

And although I could test the code on SAS OnDemand for Academics which has settings of DBCSTYPE=UTF8 and ENCODING=UTF-8 -- I do not have write access to the CONFIG file used for startup so that I could experiment with any other settings. So I believe your best avenue for help is to work with Tech Support on this.

Cynthia

Tom · Posted 11-03-2020 11:30 AM

You will need to use relative cursor movement or calculated cursor positions to adjust for the different number of bytes being written.

data have;
  length id 8 name $10 ;
  input id name ;
  offset=length(name)-klength(name);
datalines;
35 Ontario
24 Québec
46 Manitoba
;

data _null_;
  set have;
  put 
    @1 '| ' name
    @12 +offset '| ' id 
    @32 +offset '|'
  ;
run;

 87         data _null_;
 88           set have;
 89           put
 90             @1 '| ' name
 91             @12 +offset '| ' id
 92             @32 +offset '|'
 93           ;
 94         run;
 
 | Ontario  | 35                |
 | Québec   | 24                |
 | Manitoba | 46                |

gabonzo · Posted 11-03-2020 11:36 AM

I like the idea, but it would get a bit clunky if I have to calculate offsets for multiple columns. I think I will try this anyway.

Thanks!

Tom · Posted 11-03-2020 11:40 AM

@gabonzo wrote:
I like the idea, but it would get a bit clunky if I have to calculate offsets for multiple columns. I think I will try this anyway.

Thanks!

Not a good idea to try to create a fixed column text file unless you are using a single byte encoding. It is better to make a report using ODS. Or make a delimited data file, like a CSV file.

You just need to know how many fields before the current one could contain non single byte characters (basically how many character fields).

put 
  @1 field1
  @10 +offset1 field2
  @30 +offset1 +offset2 field3 
   ...
;

gabonzo · Posted 11-03-2020 11:49 AM

My first attempt was in fact to dump the table to CSV and then use a Python script (which, by the way, can handle multibyte characters just fine and has much better formatting options) to convert to Markdown, but it would have complicated my workflow, so that's why I tried to do everything with SAS --- and no, I can't use a Jupyter/SAS combo at work.

In any case the next step would be to write a macro that can handle arbitrary columns with custom length, so it will just be offsets all the way.

Thank you for the KLENGTH tip!

Tom · Posted 11-03-2020 11:54 AM

I don't understand your larger issue. Markdown is a text markup language. Why should you be trying to created pseudo fixed column output if you are using Markdown?

gabonzo · Posted 11-03-2020 12:09 PM

I am using Pandoc to stitch together a Markdown document that is then converted to Word.
The Markdown sources (text and tables) are then stored into a repository for future reference.
I know that a Markdown table doesn't need to be fixed width to work, but I like my sources to be readable.

There are some knitr/SAS tools that seem promising and could do what I need, but unfortunately my IT department doesn't want me to run R on the same computer I run SAS, so I am trying to come up with an ad-hoc solution.

Registration is open

SAS Training: Just a Click Away