SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Updating metadata

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Updating metadata

I've been experimenting with the macros for importing AD data and updating metadata with it (%mdu* in sasautos). One of the things I want to do is to populate the Description property of Person objects with a couple of pieces of information, put onto separate lines. I've tried embedding Carriage Return and Line Feed characters, and a combination of the two, but they seem to be stripped out when the metadata is updated. The macro that actually updates the metadata (%mduchlb) uses XML and PROC METADATA to do the actual updates so I'm wondering if this is what causes the character to be removed. Any ideas about this, or how I can get round it?

Thanks,

Nigel Pain


Accepted Solutions
Solution
‎05-21-2014 01:53 AM
PROC Star
Posts: 389

Re: Updating metadata

I just successfully used the following XML to update a user's description to include blank lines via the SAS Management Console's XML Metadata Interface and the Update Metadata tab:

<Person Id="A1234567.A9876543" Desc="First line&#x0a;&#x0a;&#x0a;&#x0a;... several lines later." />

... so the underlying XML API supports it.  Have you looked at the XML generated by the %mduchlb macro to see if it comes through ok in the XML?  If you have a look in the supplied source for the %mduchglb macro you'll see you can use an _mduchglb_outrequest_ macro variable for this purpose. It states:

/* _mduchglb_outrequest_ is a reserved macro variable that must be either undefined */
/* OR be GLOBAL and contain a valid host directory specification.   All blocks of   */
/* generated XML are written to this path when it is present.        */

I just had a quick look through the macro source and saw it is using an xmltrans macro which in turn uses htmlencode(strip()) on the description. The docs for htmlencode says it only encodes greater-than (>), less-than (<), and ampersand (&) characters by default. There is an second option parameter that lets you

also encode single-quote/apos ('), double-quote/quot (") and "any character that is not represented in 7-bit ASCII encoding"/7bit.  From a quick test it does not look like 7bit will do the encoding of a linefeed characters to &#x0a; Of course being a macro supplied with source you could use your own custom version of it modified to support encoding of additional characters like this. It might also be worth contacting SAS Technical Support to suggest it as an improvement for future versions.

Here's some sample code that shows the original xmltrans macro and an alternative (xmltrans2) which encodes more.

%macro xmltrans(str);

     htmlencode(strip(&str))

%mend xmltrans;

%macro xmltrans2(str);

     tranwrd(htmlencode(strip(&str), 'amp gt lt apos quot 7bit'), '0a'x, '&#x0a;')

%mend xmltrans2;

data _null_;

plain=cat("'hello' ", '0a0a'x, '"world"');

put plain= plain= hex40.;

encoded1=%xmltrans(plain);

put encoded1= encoded1= hex40.;

encoded2=%xmltrans2(plain);

put encoded2= encoded2= hex100.;

run;

View solution in original post


All Replies
Super User
Posts: 6,928

Re: Updating metadata

You could try to insert xml-valid tags for line breaks, googling "xml line break" returns several links with hints.

As a first (probably stupid) try I'd look what happens with <BR>.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 3,208

Re: Updating metadata

Well it is XML with a defined encoding. When you are using utf8 as encoding you could try the cr lf Unicode UTR #13: Unicode Newline Guidelines

probably the characters are classified as unsafe ones. The html protocol is using %0d %0a (hex representation) in those cases. Just allowing common chars (base64).
All is guessing ..... 

---->-- ja karman --<-----
Occasional Contributor
Posts: 18

Re: Updating metadata

Thanks Jaap, and Kurt too

I had a bit of a read about the placing of  newlines into XML text but it seems that all the possible ways of doing it are disallowed - I tried <BR> , %0a and &#xA; but they all just appear in the text as typed. There seemed to be a suggestion that such text would need to be embedded into a CDATA block but I'm getting out of my depth trying to work out what to do with that so will probably simply live with the two pieces of information on the same line, separated by a comma. Smiley Sad

Super User
Posts: 6,928

Re: Updating metadata

You might have a case where opening a thread with SAS tech support could either bring you to a solution or at least the information that it is not possible at all (metadata could simply have no way of storing/showing arbitrary line breaks in a field).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 18

Re: Updating metadata

Aye, that might be my next step. Interestingly, I can get a newline in there if I use the METADATA_SETATTR function in a data step. This works:

data _null_;

length uri description $256;

nobj=metadata_getnobj("omsobjSmiley Tongueerson?@Name eq 'Pain NDA (Nigel)'",1,uri);

description = catx("0a"x,"Line 1","Line 2");

rc = metadata_setattr(uri,"desc",description);

put nobj= uri= rc= ;

run;

Solution
‎05-21-2014 01:53 AM
PROC Star
Posts: 389

Re: Updating metadata

I just successfully used the following XML to update a user's description to include blank lines via the SAS Management Console's XML Metadata Interface and the Update Metadata tab:

<Person Id="A1234567.A9876543" Desc="First line&#x0a;&#x0a;&#x0a;&#x0a;... several lines later." />

... so the underlying XML API supports it.  Have you looked at the XML generated by the %mduchlb macro to see if it comes through ok in the XML?  If you have a look in the supplied source for the %mduchglb macro you'll see you can use an _mduchglb_outrequest_ macro variable for this purpose. It states:

/* _mduchglb_outrequest_ is a reserved macro variable that must be either undefined */
/* OR be GLOBAL and contain a valid host directory specification.   All blocks of   */
/* generated XML are written to this path when it is present.        */

I just had a quick look through the macro source and saw it is using an xmltrans macro which in turn uses htmlencode(strip()) on the description. The docs for htmlencode says it only encodes greater-than (>), less-than (<), and ampersand (&) characters by default. There is an second option parameter that lets you

also encode single-quote/apos ('), double-quote/quot (") and "any character that is not represented in 7-bit ASCII encoding"/7bit.  From a quick test it does not look like 7bit will do the encoding of a linefeed characters to &#x0a; Of course being a macro supplied with source you could use your own custom version of it modified to support encoding of additional characters like this. It might also be worth contacting SAS Technical Support to suggest it as an improvement for future versions.

Here's some sample code that shows the original xmltrans macro and an alternative (xmltrans2) which encodes more.

%macro xmltrans(str);

     htmlencode(strip(&str))

%mend xmltrans;

%macro xmltrans2(str);

     tranwrd(htmlencode(strip(&str), 'amp gt lt apos quot 7bit'), '0a'x, '&#x0a;')

%mend xmltrans2;

data _null_;

plain=cat("'hello' ", '0a0a'x, '"world"');

put plain= plain= hex40.;

encoded1=%xmltrans(plain);

put encoded1= encoded1= hex40.;

encoded2=%xmltrans2(plain);

put encoded2= encoded2= hex100.;

run;

Occasional Contributor
Posts: 18

Re: Updating metadata

That's it! Updating the XMLTRANS macro so that the HTMLENCODE function has the extra options added, plus doing a TRANWRD of '0a'x to '&#x0a;' got it to work. I'd never have worked that out.

Thanks very much Paul.

PROC Star
Posts: 389

Re: Updating metadata

No worries Nigel. Happy to help. It was an interesting problem Smiley Happy

Valued Guide
Posts: 3,208

Re: Updating metadata

Nigel, your SAS data step solution is bypassing the XML protocol ins't it?

---->-- ja karman --<-----
Occasional Contributor
Posts: 18

Re: Updating metadata

Jaap, yes, I presumed so.

Valued Guide
Posts: 3,208

Re: Updating metadata

google on that and you find: http://suchan.cz/2011/03/new-line-characters-in-xml/ (encaps?) wondering it would come out correct in SAS(java) as it the browsers solving that.

XML Syntax (w3schools just some basics) the newline is for readable human format of an XML doc. It is ignored/removed by the xml parser.

---->-- ja karman --<-----
Valued Guide
Posts: 3,208

Re: Updating metadata

Paul nice you did that with xmltrans - htmlencode. I could make up the cause not the whole solution. But that area you are very experienced in I assume as metacoda plugins must be based on working with that.


But reading the htmlencode doc SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition, I believe you just have shown that there are mistakes in it.
To be fully reliable I18N Level 2 it should not only mask the reserved html/xml chars but also all other chars being part of the current encoding but not valid as chars as the low order control bytes. Would you think someone of SAS would pick this thread to do that?

---->-- ja karman --<-----
PROC Star
Posts: 389

Re: Updating metadata

Jaap, whilst we haven't needed to use the xmltrans macro or the htmlencode function directly in our Metacoda software, we certainly have experience in encoding within XML (and HTML) in general.

I'm not sure whether skipping encoding of a linefeed character within the htmlencode function would be considered a mistake or not. I think it is one of those things that depends upon the context. Often whitespace characters in XML and HTML are not considered significant (where they are used purely for nice formatting for example).  In this case as part of the attribute value it was significant. It might be handy to have an additional option on the htmlencode function to allow you to specify whether certain whitespace characters such as linefeed need to be encoded or not, but then again it's just as easy to use other SAS functions to do that explicit encoding when you need it (personally I was hoping the 7bit option might have encoded the linefeed but did not necessarily expect it would).

Valued Guide
Posts: 3,208

Re: Updating metadata

Ok Paul, I have grown up in IT needing understand a lot of technical details, the bytes bits nibbles. What I am seeing is a shift in that and a lot of people are losing the ground under their feet. The 7-bit asscii is the real ascii the last bit being used as communication check (telex before fax).
The extended Ascii was introduced with the PC now the latin-1 types (yes multiple) and introduction of codepages. That is all singlebyte!
We are moving to multibyte encodings like utf8. Everyone is using that but not being aware of it. 
Introduces this kind of issues. The /n is not very well defined as you said  but the Unicode organization is proceeding. UAX #14: Unicode Line Breaking Algorithm Well SAS is claiming they are following them. The xml is indicating Unicode usage. Let it be conforming.  

---->-- ja karman --<-----
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 760 views
  • 7 likes
  • 4 in conversation