Skip to main content
Windward

Correct Character Display And XML Encoding

Overview

If you get an error like this:

net.windward.datasource.DataSourceException: net.windward.datasource.DataSourceException: Dom4jDataSource ctor
            at net.windward.datasource.dom4j.Dom4jDataSource.<init>(Unknown Source)
            ...

Caused by: org.dom4j.DocumentException: Error on line 2 of document  : Invalid byte 1 of 1-byte UTF-8 sequence. Nested exception: Invalid byte 1 of 1-byte UTF-8 sequence.
            ...

The problem is that you entered a character who's byte value is greater than 127. If you do not set the encoding then Windward assumes that you are encoding using UTF-8. (You also get this problem if you explicitly set the encoding to UTF-8.)

Resolution

With UTF-8 a character can require between 1 and 4 bytes. If it is in the ASCII 0-127 subset, it is a 1 byte character so it works as expected. But when you use characters with higher values, then you get the following. Viewing an XML file that uses the £ symbol would look like this in an XML viewer:

<?xml version="1.0" encoding="UTF-8"?>
<pound>£</pound>

But if you use a text editor you will see that the actual byte values of the file are:

<?xml version="1.0" encoding="UTF-8"?>
<pound>£</pound>

 

Here is a link to a really good article about character encoding!

http://www.joelonsoftware.com/articles/Unicode.html

  • Was this article helpful?