Skip to main content

Correct Character Display And XML Encoding


If you get an error like this:

net.windward.datasource.DataSourceException: net.windward.datasource.DataSourceException: Dom4jDataSource ctor
            at net.windward.datasource.dom4j.Dom4jDataSource.<init>(Unknown Source)

Caused by: org.dom4j.DocumentException: Error on line 2 of document  : Invalid byte 1 of 1-byte UTF-8 sequence. Nested exception: Invalid byte 1 of 1-byte UTF-8 sequence.

The problem is that you entered a character who's byte value is greater than 127. If you do not set the encoding then Windward assumes that you are encoding using UTF-8. (You also get this problem if you explicitly set the encoding to UTF-8.)


With UTF-8 a character can require between 1 and 4 bytes. If it is in the ASCII 0-127 subset, it is a 1 byte character so it works as expected. But when you use characters with higher values, then you get the following. Viewing an XML file that uses the £ symbol would look like this in an XML viewer:

<?xml version="1.0" encoding="UTF-8"?>

But if you use a text editor you will see that the actual byte values of the file are:

<?xml version="1.0" encoding="UTF-8"?>


Here is a link to a really good article about character encoding!

  • Was this article helpful?