BookmarkSubscribeRSS Feed
Satish_Parida
Lapis Lazuli | Level 10

Hi All,

 

I am converting XML files to SAS files using XML mapper.

 

Issue:

We have some Charterer variables which have leading and trailing blank characters (tab/space/line feed)in the string in the XML file. 

e.g. "

 TEsting 	" 

By W3 standard SAS removes the blank characters and creates the SAS data-set without any leading or trailing blank characters.

But we need the leading and trailing blank charactersin the Value field.

 

I have attached the Mapper file and the source file and here is the code I have used. Please change the local path in the code before you run it.

 

filename NHL "C:\Users\satish.k.parida\Desktop\ClinicalAuditRecords.xml"; /*File attached*/
filename MAP "C:\Users\satish.k.parida\Desktop\RAVE_ODM_MAP.map";   /*File attached*/

libname NHL xmlv2 xmlmap=MAP; 

data ItemData;
set NHL.ItemData;
run;

 

To see if you are getting the new lines or tabs you can copy the text from SAS viewer or editor and paste it in a notepad ++ to view symbols. I am not sure if we can run this on University edition.

 

  1. We have defined lengths for all the variables in Mapper file, and the data in the XML file is way smaller than the assigned length. Issue is XML Mapper trimming those leading and trailing spaces. 
  2. If you run the code with the files, in 2nd record we are expecting the leading and trailing spaces. How ever the blank characters in the middle of the string works fine.

 

The issue is not environment specific, I have ran in windows and Linux and UTF8 and English SAS.

 

Apologizes, as I could not upload the XML and the mapper file individually so I had to upload the zip file.

 

file snippet

<ItemData ItemOID="VS.C4" TransactionType="Upsert" Value="&#x9;&#xA;&#xA;&#xA;&#xA;        TEsting &#x9;special &#xA;character&#x9;&#x9;&#xA;&#xA;&#xA;&#xA;">

 

Thanks in Advance.

 

9 REPLIES 9
Tom
Super User Tom
Super User

SAS datasets do not have a concept of variable length strings.  How would you have SAS keep track of trailing spaces?

Satish_Parida
Lapis Lazuli | Level 10
This is not about variable lengths, we have defined lengths for all the variables in Mapper file, and the data in the XML file is way smaller than the assigned length. Issue is XML Mapper trimming those leading and trailing spaces.
If you run the code with the files, in line 2 we are expecting the leading and trailing spaces. How ever the blank characters in the middle of the string works fine.
Tom
Super User Tom
Super User

@Satish_Parida wrote:
This is not about variable lengths, we have defined lengths for all the variables in Mapper file, and the data in the XML file is way smaller than the assigned length. Issue is XML Mapper trimming those leading and trailing spaces.
If you run the code with the files, in line 2 we are expecting the leading and trailing spaces. How ever the blank characters in the middle of the string works fine.

That doesn't answer the question.  If you define a variable as $10 and assign a string with only 5 characters then SAS will store 5 extra spaces to fill out the 10 characters.  There is no difference between 'ABC' and 'ABC    '.  To be able to recreate the trailing blanks you would need to store a length value in some other variable.  Or appended an extra non-blank character to end of every value.

Satish_Parida
Lapis Lazuli | Level 10

@Tom  We have blank characters which involve tabs, spaces, new lines, line feeds

 

e.g. &#xA;&#xA; ;&#x9;

 

If line feed is converted to space then they don't serve the purpose in our case.

 

 

Tom
Super User Tom
Super User

@Satish_Parida wrote:

@Tom  We have blank characters which involve tabs, spaces, new lines, line feeds

 

e.g. &#xA;&#xA; ;&#x9;

 

If line feed is converted to space then they don't serve the purpose in our case.

 

 


Converting tabs to spaces is a different issue.  SAS can easily store tab characters.

Are you talking about such characters inside of a quoted string in the XML file?

Please post a simple file (as text using {i} Insert Code button in the forum editor) that demonstrates the issue.

Satish_Parida
Lapis Lazuli | Level 10

@Tom 

I am sorry if not explaining the issue correctly.

The field in question looks like following, in the Value attribute in the ItemData field.

 

<ItemData ItemOID="VS.C4" TransactionType="Upsert" Value="&#x9;&#xA;&#xA;&#xA;&#xA;        TEsting &#x9;special &#xA;character&#x9;&#x9;&#xA;&#xA;&#xA;&#xA;">

All the hexa codes are converted to specific values without any issue.

 

The issue is the blank characters before the first non-blank character and the blank characters after the last non-blank character are not converted at all, they are trimmed before transforming to SAS data set files.

Tom
Super User Tom
Super User

That is a little clearer now.

Can you convert your little example into a complete valid XML file so other can use it to test?

What happens if you replace the spaces with hexcodes for space character?

<ItemData ItemOID="X1" Value="ABCD">
<ItemData ItemOID="X2" Value="A  D">
<ItemData ItemOID="X3" Value="A&x20;&x20;D">

Or are you saying you are just having issues with values like:

<ItemData ItemOID="X4" Value="  ABCD">
<ItemData ItemOID="X5" Value="ABCD  ">

Note that SAS has no way to store the last example Value that would distinguish it from my first example value.  Trailing spaces have no meaning in a SAS character variable. 

Satish_Parida
Lapis Lazuli | Level 10

Q: Can you convert your little example into a complete valid XML file so other can use it to test?

Ans: Attached to the query !

 

Q: What happens if you replace the spaces with hexcodes for space character?

Ans: The Hexcodes are necessary as the Mapper uses them to convert them to particular character in SAS. e.g "&#x9;" translates to a "tab", "&amp;"  translates to "ampersand".

 

Q: Or are you saying you are just having issues with values like:

Ans: The issue is with any kind of blank character which is leading or trailing the non-blank charterers either be it plane space or hexcode for new line line feed or tabs.

 

Note: If you run the code, and copy paste the result to notepad++ you can see the result.

PS: The tab, quotes, ampersands, new lines are represented in hex codes in XML. Spaces don't need hex codes.

If we put general tab, new line, in the xml file they are converted to space from XML mapping to SAS.

Satish_Parida
Lapis Lazuli | Level 10

@Tom 

Q: Can you convert your little example into a complete valid XML file so other can use it to test?

Ans: Attached to the query !

 

Q: What happens if you replace the spaces with hexcodes for space character?

Ans: The Hexcodes are necessary as the Mapper uses them to convert them to particular character in SAS. e.g "&#x9;" translates to a "tab", "&amp;"  translates to "ampersand".

 

Q: Or are you saying you are just having issues with values like:

Ans: The issue is with any kind of blank character which is leading or trailing the non-blank charterers either be it plane space or hexcode for new line line feed or tabs.

 

Note: If you run the code, and copy paste the result to notepad++ you can see the result.

PS: The tab, quotes, ampersands, new lines are represented in hex codes in XML. Spaces don't need hex codes.

If we put general tab, new line, in the xml file they are converted to space from XML mapping to SAS.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1479 views
  • 0 likes
  • 2 in conversation