org.unbescape.xml
Class XmlEscape

Object
  extended by org.unbescape.xml.XmlEscape

public final class XmlEscape
extends Object

Utility class for performing XML escape/unescape operations.

Configuration of escape/unescape operations

Escape operations can be (optionally) configured by means of:

Unescape operations need no configuration parameters. Unescape operations will always perform complete unescape of CERs, decimal and hexadecimal references.

Features

This class supports both XML 1.0 and XML 1.1 escape/unescape operations. Whichever the XML version used, only the five predefined XML character entities are supported: <, >, &, &quot and '. This means there is no support for DTD-defined or user-defined entities.

Each version of XML establishes a series of characters that are considered not-valid, even when escaped —for example, the \u0000 (null byte)—. Escape operations will automatically remove these chars.

Also, each version of XML establishes a series of control characters that, even if allowed as valid characters, should always appear escaped. For example: \u0001 to \u0008 in XML 1.1.

This class supports the whole Unicode character set: \u0000 to \u10FFFF, including characters not representable by only one char in Java (>\uFFFF).

Input/Output

There are two different input/output modes that can be used in escape/unescape operations:

Glossary

ER
XML Entity Reference: references to variables used to define shortcuts to standard text or special characters. Entity references start with '&' and end with ';'.
CER
Character Entity Reference: XML Entity Reference used to define a shortcut to a specific character. XML specifies five predefined CERs: &lt; (<), &gt; (>), &amp; (&), &quot; (") and &apos; (').
DCR
Decimal Character Reference: base-10 numerical representation of an Unicode codepoint: &#225;
HCR
Hexadecimal Character Reference: hexadecimal numerical representation of an Unicode codepoint: á. Note that XML only allows lower-case 'x' for defining hexadecimal character entity references (in contrast with HTML, which allows both '&#x...;' and '&#X...;').
Unicode Codepoint
Each of the int values conforming the Unicode code space. Normally corresponding to a Java char primitive value (codepoint <= \uFFFF), but might be two chars for codepoints \u10000 to \u10FFFF if the first char is a high surrogate (\uD800 to \uDBFF) and the second is a low surrogate (\uDC00 to \uDFFF).

References

The following references apply:

Since:
1.0
Author:
Daniel Fernández

Method Summary
static void escapeXml10(char[] text, int offset, int len, Writer writer)
           Perform an XML 1.0 level 2 (markup-significant and all non-ASCII chars) escape operation on a char[] input.
static void escapeXml10(char[] text, int offset, int len, Writer writer, XmlEscapeType type, XmlEscapeLevel level)
           Perform a (configurable) XML 1.0 escape operation on a char[] input.
static String escapeXml10(String text)
           Perform an XML 1.0 level 2 (markup-significant and all non-ASCII chars) escape operation on a String input.
static String escapeXml10(String text, XmlEscapeType type, XmlEscapeLevel level)
           Perform a (configurable) XML 1.0 escape operation on a String input.
static void escapeXml10Minimal(char[] text, int offset, int len, Writer writer)
           Perform an XML 1.0 level 1 (only markup-significant chars) escape operation on a char[] input.
static String escapeXml10Minimal(String text)
           Perform an XML 1.0 level 1 (only markup-significant chars) escape operation on a String input.
static void escapeXml11(char[] text, int offset, int len, Writer writer)
           Perform an XML 1.1 level 2 (markup-significant and all non-ASCII chars) escape operation on a char[] input.
static void escapeXml11(char[] text, int offset, int len, Writer writer, XmlEscapeType type, XmlEscapeLevel level)
           Perform a (configurable) XML 1.1 escape operation on a char[] input.
static String escapeXml11(String text)
           Perform an XML 1.1 level 2 (markup-significant and all non-ASCII chars) escape operation on a String input.
static String escapeXml11(String text, XmlEscapeType type, XmlEscapeLevel level)
           Perform a (configurable) XML 1.1 escape operation on a String input.
static void escapeXml11Minimal(char[] text, int offset, int len, Writer writer)
           Perform an XML 1.1 level 1 (only markup-significant chars) escape operation on a char[] input.
static String escapeXml11Minimal(String text)
           Perform an XML 1.1 level 1 (only markup-significant chars) escape operation on a String input.
static void unescapeXml(char[] text, int offset, int len, Writer writer)
           Perform an XML unescape operation on a char[] input.
static String unescapeXml(String text)
           Perform an XML unescape operation on a String input.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

escapeXml10Minimal

public static String escapeXml10Minimal(String text)

Perform an XML 1.0 level 1 (only markup-significant chars) escape operation on a String input.

Level 1 means this method will only escape the five markup-significant characters which are predefined as Character Entity References in XML: <, >, &, " and '.

This method calls escapeXml10(String, XmlEscapeType, XmlEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the String to be escaped.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeXml11Minimal

public static String escapeXml11Minimal(String text)

Perform an XML 1.1 level 1 (only markup-significant chars) escape operation on a String input.

Level 1 means this method will only escape the five markup-significant characters which are predefined as Character Entity References in XML: <, >, &, " and '.

This method calls escapeXml11(String, XmlEscapeType, XmlEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the String to be escaped.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeXml10

public static String escapeXml10(String text)

Perform an XML 1.0 level 2 (markup-significant and all non-ASCII chars) escape operation on a String input.

Level 2 means this method will escape:

This escape will be performed by replacing those chars by the corresponding XML Character Entity References (e.g. '&lt;') when such CER exists for the replaced character, and replacing by a hexadecimal character reference (e.g. '&#x2430;') when there there is no CER for the replaced character.

This method calls escapeXml10(String, XmlEscapeType, XmlEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the String to be escaped.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeXml11

public static String escapeXml11(String text)

Perform an XML 1.1 level 2 (markup-significant and all non-ASCII chars) escape operation on a String input.

Level 2 means this method will escape:

This escape will be performed by replacing those chars by the corresponding XML Character Entity References (e.g. '&lt;') when such CER exists for the replaced character, and replacing by a hexadecimal character reference (e.g. '&#x2430;') when there there is no CER for the replaced character.

This method calls escapeXml11(String, XmlEscapeType, XmlEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the String to be escaped.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeXml10

public static String escapeXml10(String text,
                                 XmlEscapeType type,
                                 XmlEscapeLevel level)

Perform a (configurable) XML 1.0 escape operation on a String input.

This method will perform an escape operation according to the specified XmlEscapeType and XmlEscapeLevel argument values.

All other String-based escapeXml10*(...) methods call this one with preconfigured type and level values.

This method is thread-safe.

Parameters:
text - the String to be escaped.
type - the type of escape operation to be performed, see XmlEscapeType.
level - the escape level to be applied, see XmlEscapeLevel.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeXml11

public static String escapeXml11(String text,
                                 XmlEscapeType type,
                                 XmlEscapeLevel level)

Perform a (configurable) XML 1.1 escape operation on a String input.

This method will perform an escape operation according to the specified XmlEscapeType and XmlEscapeLevel argument values.

All other String-based escapeXml11*(...) methods call this one with preconfigured type and level values.

This method is thread-safe.

Parameters:
text - the String to be escaped.
type - the type of escape operation to be performed, see XmlEscapeType.
level - the escape level to be applied, see XmlEscapeLevel.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeXml10Minimal

public static void escapeXml10Minimal(char[] text,
                                      int offset,
                                      int len,
                                      Writer writer)
                               throws IOException

Perform an XML 1.0 level 1 (only markup-significant chars) escape operation on a char[] input.

Level 1 means this method will only escape the five markup-significant characters which are predefined as Character Entity References in XML: <, >, &, " and '.

This method calls escapeXml10(char[], int, int, java.io.Writer, XmlEscapeType, XmlEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
Throws:
IOException

escapeXml11Minimal

public static void escapeXml11Minimal(char[] text,
                                      int offset,
                                      int len,
                                      Writer writer)
                               throws IOException

Perform an XML 1.1 level 1 (only markup-significant chars) escape operation on a char[] input.

Level 1 means this method will only escape the five markup-significant characters which are predefined as Character Entity References in XML: <, >, &, " and '.

This method calls escapeXml10(char[], int, int, java.io.Writer, XmlEscapeType, XmlEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
Throws:
IOException

escapeXml10

public static void escapeXml10(char[] text,
                               int offset,
                               int len,
                               Writer writer)
                        throws IOException

Perform an XML 1.0 level 2 (markup-significant and all non-ASCII chars) escape operation on a char[] input.

Level 2 means this method will escape:

This escape will be performed by replacing those chars by the corresponding XML Character Entity References (e.g. '&lt;') when such CER exists for the replaced character, and replacing by a hexadecimal character reference (e.g. '&#x2430;') when there there is no CER for the replaced character.

This method calls escapeXml10(char[], int, int, java.io.Writer, XmlEscapeType, XmlEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
Throws:
IOException

escapeXml11

public static void escapeXml11(char[] text,
                               int offset,
                               int len,
                               Writer writer)
                        throws IOException

Perform an XML 1.1 level 2 (markup-significant and all non-ASCII chars) escape operation on a char[] input.

Level 2 means this method will escape:

This escape will be performed by replacing those chars by the corresponding XML Character Entity References (e.g. '&lt;') when such CER exists for the replaced character, and replacing by a hexadecimal character reference (e.g. '&#x2430;') when there there is no CER for the replaced character.

This method calls escapeXml11(char[], int, int, java.io.Writer, XmlEscapeType, XmlEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
Throws:
IOException

escapeXml10

public static void escapeXml10(char[] text,
                               int offset,
                               int len,
                               Writer writer,
                               XmlEscapeType type,
                               XmlEscapeLevel level)
                        throws IOException

Perform a (configurable) XML 1.0 escape operation on a char[] input.

This method will perform an escape operation according to the specified XmlEscapeType and XmlEscapeLevel argument values.

All other char[]-based escapeXml10*(...) methods call this one with preconfigured type and level values.

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
type - the type of escape operation to be performed, see XmlEscapeType.
level - the escape level to be applied, see XmlEscapeLevel.
Throws:
IOException

escapeXml11

public static void escapeXml11(char[] text,
                               int offset,
                               int len,
                               Writer writer,
                               XmlEscapeType type,
                               XmlEscapeLevel level)
                        throws IOException

Perform a (configurable) XML 1.1 escape operation on a char[] input.

This method will perform an escape operation according to the specified XmlEscapeType and XmlEscapeLevel argument values.

All other char[]-based escapeXml11*(...) methods call this one with preconfigured type and level values.

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
type - the type of escape operation to be performed, see XmlEscapeType.
level - the escape level to be applied, see XmlEscapeLevel.
Throws:
IOException

unescapeXml

public static String unescapeXml(String text)

Perform an XML unescape operation on a String input.

No additional configuration arguments are required. Unescape operations will always perform complete XML 1.0/1.1 unescape of CERs, decimal and hexadecimal references.

This method is thread-safe.

Parameters:
text - the String to be unescaped.
Returns:
The unescaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no unescaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

unescapeXml

public static void unescapeXml(char[] text,
                               int offset,
                               int len,
                               Writer writer)
                        throws IOException

Perform an XML unescape operation on a char[] input.

No additional configuration arguments are required. Unescape operations will always perform complete XML 1.0/1.1 unescape of CERs, decimal and hexadecimal references.

This method is thread-safe.

Parameters:
text - the char[] to be unescaped.
offset - the position in text at which the unescape operation should start.
len - the number of characters in text that should be unescaped.
writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if text is null.
Throws:
IOException


Copyright © 2014 The UNBESCAPE team. All rights reserved.