BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nigel_Pain
Lapis Lazuli | Level 10

I've been experimenting with the macros for importing AD data and updating metadata with it (%mdu* in sasautos). One of the things I want to do is to populate the Description property of Person objects with a couple of pieces of information, put onto separate lines. I've tried embedding Carriage Return and Line Feed characters, and a combination of the two, but they seem to be stripped out when the metadata is updated. The macro that actually updates the metadata (%mduchlb) uses XML and PROC METADATA to do the actual updates so I'm wondering if this is what causes the character to be removed. Any ideas about this, or how I can get round it?

Thanks,

Nigel Pain

1 ACCEPTED SOLUTION

Accepted Solutions
PaulHomes
Rhodochrosite | Level 12

I just successfully used the following XML to update a user's description to include blank lines via the SAS Management Console's XML Metadata Interface and the Update Metadata tab:

<Person Id="A1234567.A9876543" Desc="First line&#x0a;&#x0a;&#x0a;&#x0a;... several lines later." />

... so the underlying XML API supports it.  Have you looked at the XML generated by the %mduchlb macro to see if it comes through ok in the XML?  If you have a look in the supplied source for the %mduchglb macro you'll see you can use an _mduchglb_outrequest_ macro variable for this purpose. It states:

/* _mduchglb_outrequest_ is a reserved macro variable that must be either undefined */
/* OR be GLOBAL and contain a valid host directory specification.   All blocks of   */
/* generated XML are written to this path when it is present.        */

I just had a quick look through the macro source and saw it is using an xmltrans macro which in turn uses htmlencode(strip()) on the description. The docs for htmlencode says it only encodes greater-than (>), less-than (<), and ampersand (&) characters by default. There is an second option parameter that lets you

also encode single-quote/apos ('), double-quote/quot (") and "any character that is not represented in 7-bit ASCII encoding"/7bit.  From a quick test it does not look like 7bit will do the encoding of a linefeed characters to &#x0a; Of course being a macro supplied with source you could use your own custom version of it modified to support encoding of additional characters like this. It might also be worth contacting SAS Technical Support to suggest it as an improvement for future versions.

Here's some sample code that shows the original xmltrans macro and an alternative (xmltrans2) which encodes more.

%macro xmltrans(str);

     htmlencode(strip(&str))

%mend xmltrans;

%macro xmltrans2(str);

     tranwrd(htmlencode(strip(&str), 'amp gt lt apos quot 7bit'), '0a'x, '&#x0a;')

%mend xmltrans2;

data _null_;

plain=cat("'hello' ", '0a0a'x, '"world"');

put plain= plain= hex40.;

encoded1=%xmltrans(plain);

put encoded1= encoded1= hex40.;

encoded2=%xmltrans2(plain);

put encoded2= encoded2= hex100.;

run;

View solution in original post

14 REPLIES 14
Kurt_Bremser
Super User

You could try to insert xml-valid tags for line breaks, googling "xml line break" returns several links with hints.

As a first (probably stupid) try I'd look what happens with <BR>.

jakarman
Barite | Level 11

Well it is XML with a defined encoding. When you are using utf8 as encoding you could try the cr lf Unicode UTR #13: Unicode Newline Guidelines

probably the characters are classified as unsafe ones. The html protocol is using %0d %0a (hex representation) in those cases. Just allowing common chars (base64).
All is guessing ..... 

---->-- ja karman --<-----
Nigel_Pain
Lapis Lazuli | Level 10

Thanks Jaap, and Kurt too

I had a bit of a read about the placing of  newlines into XML text but it seems that all the possible ways of doing it are disallowed - I tried <BR> , %0a and &#xA; but they all just appear in the text as typed. There seemed to be a suggestion that such text would need to be embedded into a CDATA block but I'm getting out of my depth trying to work out what to do with that so will probably simply live with the two pieces of information on the same line, separated by a comma. Smiley Sad

Kurt_Bremser
Super User

You might have a case where opening a thread with SAS tech support could either bring you to a solution or at least the information that it is not possible at all (metadata could simply have no way of storing/showing arbitrary line breaks in a field).

Nigel_Pain
Lapis Lazuli | Level 10

Aye, that might be my next step. Interestingly, I can get a newline in there if I use the METADATA_SETATTR function in a data step. This works:

data _null_;

length uri description $256;

nobj=metadata_getnobj("omsobj:Person?@Name eq 'Pain NDA (Nigel)'",1,uri);

description = catx("0a"x,"Line 1","Line 2");

rc = metadata_setattr(uri,"desc",description);

put nobj= uri= rc= ;

run;

PaulHomes
Rhodochrosite | Level 12

I just successfully used the following XML to update a user's description to include blank lines via the SAS Management Console's XML Metadata Interface and the Update Metadata tab:

<Person Id="A1234567.A9876543" Desc="First line&#x0a;&#x0a;&#x0a;&#x0a;... several lines later." />

... so the underlying XML API supports it.  Have you looked at the XML generated by the %mduchlb macro to see if it comes through ok in the XML?  If you have a look in the supplied source for the %mduchglb macro you'll see you can use an _mduchglb_outrequest_ macro variable for this purpose. It states:

/* _mduchglb_outrequest_ is a reserved macro variable that must be either undefined */
/* OR be GLOBAL and contain a valid host directory specification.   All blocks of   */
/* generated XML are written to this path when it is present.        */

I just had a quick look through the macro source and saw it is using an xmltrans macro which in turn uses htmlencode(strip()) on the description. The docs for htmlencode says it only encodes greater-than (>), less-than (<), and ampersand (&) characters by default. There is an second option parameter that lets you

also encode single-quote/apos ('), double-quote/quot (") and "any character that is not represented in 7-bit ASCII encoding"/7bit.  From a quick test it does not look like 7bit will do the encoding of a linefeed characters to &#x0a; Of course being a macro supplied with source you could use your own custom version of it modified to support encoding of additional characters like this. It might also be worth contacting SAS Technical Support to suggest it as an improvement for future versions.

Here's some sample code that shows the original xmltrans macro and an alternative (xmltrans2) which encodes more.

%macro xmltrans(str);

     htmlencode(strip(&str))

%mend xmltrans;

%macro xmltrans2(str);

     tranwrd(htmlencode(strip(&str), 'amp gt lt apos quot 7bit'), '0a'x, '&#x0a;')

%mend xmltrans2;

data _null_;

plain=cat("'hello' ", '0a0a'x, '"world"');

put plain= plain= hex40.;

encoded1=%xmltrans(plain);

put encoded1= encoded1= hex40.;

encoded2=%xmltrans2(plain);

put encoded2= encoded2= hex100.;

run;

Nigel_Pain
Lapis Lazuli | Level 10

That's it! Updating the XMLTRANS macro so that the HTMLENCODE function has the extra options added, plus doing a TRANWRD of '0a'x to '&#x0a;' got it to work. I'd never have worked that out.

Thanks very much Paul.

PaulHomes
Rhodochrosite | Level 12

No worries Nigel. Happy to help. It was an interesting problem Smiley Happy

jakarman
Barite | Level 11

Nigel, your SAS data step solution is bypassing the XML protocol ins't it?

---->-- ja karman --<-----
Nigel_Pain
Lapis Lazuli | Level 10

Jaap, yes, I presumed so.

jakarman
Barite | Level 11

google on that and you find: http://suchan.cz/2011/03/new-line-characters-in-xml/ (encaps?) wondering it would come out correct in SAS(java) as it the browsers solving that.

XML Syntax (w3schools just some basics) the newline is for readable human format of an XML doc. It is ignored/removed by the xml parser.

---->-- ja karman --<-----
jakarman
Barite | Level 11

Paul nice you did that with xmltrans - htmlencode. I could make up the cause not the whole solution. But that area you are very experienced in I assume as metacoda plugins must be based on working with that.


But reading the htmlencode doc SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition, I believe you just have shown that there are mistakes in it.
To be fully reliable I18N Level 2 it should not only mask the reserved html/xml chars but also all other chars being part of the current encoding but not valid as chars as the low order control bytes. Would you think someone of SAS would pick this thread to do that?

---->-- ja karman --<-----
PaulHomes
Rhodochrosite | Level 12

Jaap, whilst we haven't needed to use the xmltrans macro or the htmlencode function directly in our Metacoda software, we certainly have experience in encoding within XML (and HTML) in general.

I'm not sure whether skipping encoding of a linefeed character within the htmlencode function would be considered a mistake or not. I think it is one of those things that depends upon the context. Often whitespace characters in XML and HTML are not considered significant (where they are used purely for nice formatting for example).  In this case as part of the attribute value it was significant. It might be handy to have an additional option on the htmlencode function to allow you to specify whether certain whitespace characters such as linefeed need to be encoded or not, but then again it's just as easy to use other SAS functions to do that explicit encoding when you need it (personally I was hoping the 7bit option might have encoded the linefeed but did not necessarily expect it would).

jakarman
Barite | Level 11

Ok Paul, I have grown up in IT needing understand a lot of technical details, the bytes bits nibbles. What I am seeing is a shift in that and a lot of people are losing the ground under their feet. The 7-bit asscii is the real ascii the last bit being used as communication check (telex before fax).
The extended Ascii was introduced with the PC now the latin-1 types (yes multiple) and introduction of codepages. That is all singlebyte!
We are moving to multibyte encodings like utf8. Everyone is using that but not being aware of it. 
Introduces this kind of issues. The /n is not very well defined as you said  but the Unicode organization is proceeding. UAX #14: Unicode Line Breaking Algorithm Well SAS is claiming they are following them. The xml is indicating Unicode usage. Let it be conforming.  

---->-- ja karman --<-----

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 2658 views
  • 7 likes
  • 4 in conversation