org.unbescape.java
Class JavaEscape

Object
  extended by org.unbescape.java.JavaEscape

public final class JavaEscape
extends Object

Utility class for performing Java escape/unescape operations.

Configuration of escape/unescape operations

Escape operations can be (optionally) configured by means of:

Unbescape does not define a 'type' for Java escaping (just a level) because, given the way Unicode Escapes work in Java, there is no possibility to choose whether we want to escape, for example, a tab character (U+0009) as a Single Escape Char (\t) or as a Unicode Escape (\u0009). Given Unicode Escapes are processed by the compiler and not the runtime, using \u0009 instead of \t would really insert a tab character inside our source code before compiling, which is not equivalent to inserting "\t" in a String literal.

Unescape operations need no configuration parameters. Unescape operations will always perform complete unescape of SECs (\n), u-based (\u00E1) and octal (\341) escapes.

Features

Specific features of the Java escape/unescape operations performed by means of this class:

Specific features of Unicode Escapes in Java

The way Unicode Escapes work in Java is different to other languages like e.g. JavaScript. In Java, these UHEXA escapes are processed by the compiler itself, and therefore resolved before any other type of escapes. Besides, UHEXA escapes can appear anywhere in the code, not only String literals. This means that, while in JavaScript 'a\u005Cna' would be displayed as a\na, in Java "a\u005Cna" would in fact be displayed in two lines: a+<LF>+a.

Going even further, this is perfectly valid Java code:

final String hello = \u0022Hello, World!\u0022;

Also, Java allows to write any number of 'u' characters in this type of escapes, like \uu00E1 or even \uuuuuuuuu00E1. This is so in order to enable legacy compatibility with older code-processing tools that didn't support Unicode processing at all, which would fail when finding an Unicode escape like \u00E1, but not \uu00E1 (because they would consider \u as the escape). So this is valid Java code too:

final String hello = \uuuuuuuu0022Hello, World!\u0022;

In order to correctly unescape Java UHEXA escapes like "a\u005Cna", Unbescape will perform a two-pass process so that all unicode escapes are processed in the first pass, and then the single escape characters and octal escapes in the second pass.

Input/Output

There are two different input/output modes that can be used in escape/unescape operations:

Glossary

SEC
Single Escape Character: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
UHEXA escapes
Also called u-based hexadecimal escapes or simply unicode escapes: complete representation of unicode codepoints up to U+FFFF, with \u followed by exactly four hexadecimal figures: \u00E1. Unicode codepoints > U+FFFF can be represented in Java by mean of two UHEXA escapes (a surrogate pair).
Octal escapes
Octal representation of unicode codepoints up to U+00FF, with \ followed by up to three octal figures: \071. Though up to three octal figures are allowed, octal numbers > 377 (0xFF) are not supported. These are not supported in escape operations because the use of octal escapes is not recommended by the Java Language Specification (it's usage is allowed mainly for C compatibility reasons).
Unicode Codepoint
Each of the int values conforming the Unicode code space. Normally corresponding to a Java char primitive value (codepoint <= \uFFFF), but might be two chars for codepoints \u10000 to \u10FFFF if the first char is a high surrogate (\uD800 to \uDBFF) and the second is a low surrogate (\uDC00 to \uDFFF).

References

The following references apply:

Since:
1.0
Author:
Daniel Fernández

Method Summary
static void escapeJava(char[] text, int offset, int len, Writer writer)
           Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.
static void escapeJava(char[] text, int offset, int len, Writer writer, JavaEscapeLevel level)
           Perform a (configurable) Java escape operation on a char[] input.
static String escapeJava(String text)
           Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input.
static String escapeJava(String text, JavaEscapeLevel level)
           Perform a (configurable) Java escape operation on a String input.
static void escapeJavaMinimal(char[] text, int offset, int len, Writer writer)
           Perform a Java level 1 (only basic set) escape operation on a char[] input.
static String escapeJavaMinimal(String text)
           Perform a Java level 1 (only basic set) escape operation on a String input.
static void unescapeJava(char[] text, int offset, int len, Writer writer)
           Perform a Java unescape operation on a char[] input.
static String unescapeJava(String text)
           Perform a Java unescape operation on a String input.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

escapeJavaMinimal

public static String escapeJavaMinimal(String text)

Perform a Java level 1 (only basic set) escape operation on a String input.

Level 1 means this method will only escape the Java basic escape set:

This method calls escapeJava(String, JavaEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the String to be escaped.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeJava

public static String escapeJava(String text)

Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input.

Level 2 means this method will escape:

This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

This method calls escapeJava(String, JavaEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the String to be escaped.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeJava

public static String escapeJava(String text,
                                JavaEscapeLevel level)

Perform a (configurable) Java escape operation on a String input.

This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

All other String-based escapeJava*(...) methods call this one with preconfigured level values.

This method is thread-safe.

Parameters:
text - the String to be escaped.
level - the escape level to be applied, see JavaEscapeLevel.
Returns:
The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

escapeJavaMinimal

public static void escapeJavaMinimal(char[] text,
                                     int offset,
                                     int len,
                                     Writer writer)
                              throws IOException

Perform a Java level 1 (only basic set) escape operation on a char[] input.

Level 1 means this method will only escape the Java basic escape set:

This method calls escapeJava(char[], int, int, java.io.Writer, JavaEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
Throws:
IOException

escapeJava

public static void escapeJava(char[] text,
                              int offset,
                              int len,
                              Writer writer)
                       throws IOException

Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.

Level 2 means this method will escape:

This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

This method calls escapeJava(char[], int, int, java.io.Writer, JavaEscapeLevel) with the following preconfigured values:

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
Throws:
IOException

escapeJava

public static void escapeJava(char[] text,
                              int offset,
                              int len,
                              Writer writer,
                              JavaEscapeLevel level)
                       throws IOException

Perform a (configurable) Java escape operation on a char[] input.

This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

All other char[]-based escapeJava*(...) methods call this one with preconfigured level values.

This method is thread-safe.

Parameters:
text - the char[] to be escaped.
offset - the position in text at which the escape operation should start.
len - the number of characters in text that should be escaped.
writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if text is null.
level - the escape level to be applied, see JavaEscapeLevel.
Throws:
IOException

unescapeJava

public static String unescapeJava(String text)

Perform a Java unescape operation on a String input.

No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

This method is thread-safe.

Parameters:
text - the String to be unescaped.
Returns:
The unescaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no unescaping modifications were required (and no additional String objects will be created during processing). Will return null if text is null.

unescapeJava

public static void unescapeJava(char[] text,
                                int offset,
                                int len,
                                Writer writer)
                         throws IOException

Perform a Java unescape operation on a char[] input.

No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

This method is thread-safe.

Parameters:
text - the char[] to be unescaped.
offset - the position in text at which the unescape operation should start.
len - the number of characters in text that should be unescaped.
writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if text is null.
Throws:
IOException


Copyright © 2014 The UNBESCAPE team. All rights reserved.