Trying to import thousands of XML files and the ones that contain an Apostrophe (?) does not seem to work. Some example are:
- á
- É
- ŕ
I read this can be solved with xmlprocess=permit. However, still same error. Any other solutions?
"Does not seem to work" is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the "</>" to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.
Where are these characters? In file names? In the path to files? Names of columns? Values in columns?
Those are not apostrophes. The first two are accents that provide information on how the vowel is pronounced. I don't recognize the third one but is again likely a different foreign language character.
Which brings up things like which operating system are you using (may have impact on case and characters in file descriptors) and the language setting your SAS is using.
The error I get is:
ERROR: Some code points did not transcode.
occurred at or near line 41982, column 57
ERROR: XML parsing error. Please verify that the XML content is well-formed.
I tried reproducing it with an example, but I cant get the same error. Below is an example, it reads the data, but the characters are read wrong when I look at it in SAS. Indeed I believe it has to do with some language setting. It looks like the characters are Czech.
test_xml.xml:
<ssf:SSF>
<app>
<char_field>A</char_field>
</app>
<app>
<char_field>B</char_field>
</app>
<app>
<char_field>á</char_field>
</app>
<app>
<char_field>É</char_field>
</app>
<app>
<char_field>ŕ</char_field>
</app>
</ssf:SSF>
test.map:
<?xml version="1.0" encoding="UTF-8"?>
<SXLEMAP version="2.1" name="SXLEMAP">
<!-- ############################################################ -->
<TABLE name="app">
<TABLE-PATH syntax="XPath">/ssf:SSF/app</TABLE-PATH>
<COLUMN name="char_field"> <PATH syntax="XPath">/ssf:SSF/app/char_field</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>255</LENGTH> </COLUMN>
</TABLE>
</SXLEMAP>
filename temp_pl 'loc/test_xml.xml';
filename map_temp 'loc/test.map';
libname temp_pl xmlv2 xmlmap = map_temp;
data test;
set temp_pl.app;
run;
Changing the sas session encoding to utf-8 should solve the problem. The encoding can only be changed during starting the sas session.If sas runs on a server contacting an admin is necessary.
Is there no way to specify when reading the file itself? Indeed SAS in on a server and it seems like they do not want to change the encoding.
Can you change the file-enconding to utf-8-bom (Notepad++ can do this)? SAS should recognize it and then tries to read/convert the chars. But if those chars are not in the current codepage, SAS can't do anything to read the data properly.
I cant install Notepad++, and if I could, I still have thousands of files which would just take to much time.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.