Index Changes

Specifying Delimiters

Delimiter List

You can define a set of delimiters — a delimiter list — for any node in the hierarchical data structure. This delimiter list is used in the external data representation for that node and its descendents. A delimiter list defined for any non-root node overrides the effect of any ancestor node’s delimiter list on both the node itself and its descendents.

Delimiters are defined using the Delimiter List Editor, as illustrated in the following figure. The editor is invoked by clicking the delim property value field in the node's property dialog box and clicking the ellipsis (…) button, or by double-clicking the field. See Defining a Delimiter List for additional information.

Clicking within a field in the Delimiter List Editor enables the field for editing. After typing a value into a field, you must press Enter to set the value. Clicking the drop-down menu button in one of the following three fields displays its menu, as illustrated in the following figure.

  • Type
  • Optional
  • Terminator

Delimiter List Editor: Left Side

Delimiter List Editor: Right Side

Table 1 Delimiter List Editor Command Buttons

Command Action
Add Level Adds a new level after the selected level.
Add Delimiter Adds a new delimiter after the selected delimiter, or to the bottom of list under the selected level.
Remove Deletes the selected line item (level or delimiter) from the list.
Remove All Deletes all items (levels and delimiters) from the list.
OK Saves your entries and closes the editor.
Cancel Discards your entries and closes the editor.

Delimiter Properties

Table 2 Delimiter Properties

Property Description
Level Assigns consecutive sets of delimiter parameters to delimited nodes in the Encoder node hierarchy. See Delimiter Levels for additional information.
Type Specifies how the delimiter is used. See Delimiter Type for additional information.
Precedence Indicates the priority of a certain delimiter, relative to other delimiters. See Precedence for additional information.
Optional Specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty. See Optional for additional information. Note: Does not apply to children of choice element nodes.
Terminator Specifies how delimiters are to be handled for a specific terminator node in the Encoder tree. See Terminator for additional information.
Bytes Specifies the characters (bytes) to use to end the delimited data for the specified level. Delimiters can have begin bytes, end bytes, or both. The term “bytes” (by itself) always indicates end bytes. See Delimiter Characters (Bytes) for additional information.
Offset Offset of the delimited data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is 0.
Length Length of the data field in bytes, if it is of fixed length. Value must be positive integer. Entering a value clears the Bytes field
Detached When checked, indicates that the specified delimiter is a detached, or non-anchored, delimiter, and does not have to appear at a fixed position.
BegBytes Specifies the characters (bytes) to use to begin the delimited data for the specified level. Delimiters can have begin bytes, end bytes, or both. See Delimiter Characters (Bytes) for additional information.
BegOffset Offset of the fixed-length data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is 0.
BegLength Length of the data field in bytes, if it is of fixed length and has a beginning delimiter. Value must be positive integer. Entering a value clears the Bytes field
BegDetached When checked, indicates that the specified delimiter is a detached (non-anchored) beginning delimiter, and does not have to appear at a fixed position.
Skip When checked, skips identical leading (begin) delimiters. The delimiters may be defined either as begin bytes or end bytes. The purpose of this flag is to facilitate parsing tabular data.
Collapse When checked, collapses identical, consecutive end delimiters into a single delimiter. As with the Skip flag, the purpose of this flag is to facilitate parsing tabular data.

Delimiter Levels

Delimiter levels are assigned in order to those hierarchical levels of an Encoder that contain at least one node that is specified as being delimited. If none of the nodes at a particular hierarchical level is delimited, that hierarchical level is skipped in assigning delimiter levels.

Delimiter lists are typically specified on the root node, so that the list applies to the entire Encoder. The root node itself is typically not delimited, so that Level 1 would apply to those nodes that are children of the root node. See the following figure and example.

For example, if you want to parse the following data:

a^b|c^d|e

you might create a Custom Encoder as follows:

  • root
    • element_1
      • field_1
      • field_2
    • element_2
      • field_3
      • field_4
    • field_5

In this example, the delimiter list is specified on the root node, which is not delimited; therefore, the list has two levels:

  • Level 1
    • Delimiter |
  • Level 2
    • Delimiter ^

The Level 1 delimiter (|) applies to element_1, element_2, and field_5. The Level 2 delimiter (^) applies to field_1 - field_4.

If the root node is set to be delimited, the Level 1 delimiters will then apply to it. Using the above example, the Level 2 delimiter (^) would then apply to element_1, element_2, and field_5, and a new Level 3 delimiter would apply to field_1 - field_4.

Delimiter lists can be much more complex than this very simple example. For instance, you can create multiple delimiters of different types at any given level, and you can specify a delimiter list on any node within the Encoder— not only the root node as shown in the example. See Defining a Delimiter List for a step-by-step description of the procedure for creating a Delimiter List.

Delimiter Type

The Delimiter Type property specifies whether the delimiter is a terminator at the end of the byte sequence (normal), a separator between byte sequences in an array (repeat) or an escape sequence.

Table 3 Delimiter Type Options

Option Description
normal Indicates the delimiter is a normal delimiter.
repeat Indicates the delimiter is a delimiter that delimits repetitive fields (nodes). If a node is defined to be repetitive, then a repeat delimiter can be used to delimit the repetitive occurrences, while a normal delimiter terminates the repitition. For example, a^b^c1~c2~c3~c4~c5^, where '' is a delimiter that delimits repetitive nodes and '^' is a normal delimiter that terminates repetitive nodes.
escape Indicates the delimiter is an escape delimiter. The purpose of escape delimiter is to escape special bytes, such as delimiters, using predefined escape sequences. Once the bytes of the escape delimiter are matched, no action is taken except that the search is continued at the position immediately following the delimiter bytes.
quot-esc The quot-escape delimiter is used to escape special bytes using quotation style escaping, that is, whatever appears within the (double) quotes is escaped. For example, assume that ',' (comma) is a normal delimiter. To escape ',' in the data, either we can use an escape sequence such as data\,data or we can use quotation marks such as "data,data". The bytes defined in the quot-escape delimiter represent the quotation marks.

Escape Option

An escape delimiter is simply a sequence that is recognized and ignored during parsing. Its purpose is to allow the use of escape sequences to embed byte sequences in data that would otherwise be seen as delimiter occurrences.

For example, if there is a normal delimiter “+” at a given level, and we define an escape delimiter “\+” as shown in the following figure, then aaa+b\+c+ddd will parse as three fields: aaa, b\+c, and ddd. If the escape delimiter were not defined, the sequence would then parse as four fields: aaa, b\, c, and ddd.

If there is only an escape delimiter on a given level, however, it presents a no delimiter defined situation for delim and array nodes.

Precedence

Precedence indicates the priority of a certain delimiter, relative to the other delimiters. Precedence is used to resolve delimiter conflicts when one delimiter is a copy or prefix of another. In case of equal precedence, the innermost prevails.

By default, all delimiters are at precedence 10, which means they are all considered the same; fixed fields are hard-coded at precedence 10. Delimiters on parent nodes are not considered when parsing the child fields; only the child’s delimiter (or if it is a fixed field, its length). The range of valid precedence values is from 1 to 100, inclusive. The higher the value, the higher the precedence. Delimiters with higher precedence have a greater chance to be matched.

Changing the precedence of a delimiter will cause them to be applied to the input data-stream in different ways. For example:

  • root
    • element (type delim, delimiter = “^”, repeat)
    • field_1 (type fixed, length = 5)
    • field_2 (type fixed, length = 8, optional)
      Although this will parse ”abcde12345678^zyxvuABCDEFGH’, it will not parse the text ”abcde^zyxvuABCDEFGH’ even though the second fixed field is optional. The reason is that the element’s delimiter is ignored within the fixed field because they have the same precedence. If you want the element’s delimiter to be examined within the fixed field data, you must change its precedence, for example:
      root
    • element (type delim, delimiter = “^”, repeat, precedence = 11)
    • field_1 (type fixed, length = 5)
    • field_2 (type fixed, length = 8, optional)
      This will successfully parse the text ”abcde^zyxvuABCDEFGH’.
      A similar argument can be applied to delimited child nodes. The parser normally attempts to match the child delimiter— setting the precedence to 11 forces the parser to match the parent delimiter first.

Optional

The Optional property specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty.

Table 4 Optional Mode Options

Option Rule
never If the node is absent, the delimiter is not allowed in either input or output.
allow If the node is absent, the delimiter is allowed in input but will not be emitted in output.
cheer If the node is absent, the delimiter is allowed in input and will also be emitted in output.
force If the node is absent, the delimiter must appear in input and will be emitted in output. Note: Only this option allows trailing delimiters for a sequence of absent optional nodes.

As illustrative examples, consider the tree structures shown in the following figure and table, where the node a has a caret (^) as its delimiter, and the child nodes b, c, and d all have asterisks (*) as their delimiters.

  • Example 1: Child node c is optional. (Child nodes c and d must have different values for the match parameter.)

Option Input Output
never b*d^ b*d^
allow b**d^ b*d^
cheer b**d^ b**d^
force b**d^ b**d^
  • Example 2: Child nodes c and d are both optional.

Option Input Output
never b^ b^
allow b^, b*^, or b**^ b^
cheer b^, b*^, or b**^ b**^
force b**^ b**^

Terminator

The Terminator property specifies how delimiters are to be handled for a specific terminator node in the Encoder tree.

Table 5 Terminator Mode Options

Option Rule
never Specifies that the delimiter is not allowed to be a terminator in input and will not be emitted as terminator in output.
allow Specifies that the delimiter is allowed to be a terminator in input but will not be emitted as terminator in output.
cheer Specifies that the delimiter is allowed to be a terminator in input and will be emitted as terminator in output.
force Specifies that the delimiter must appear as a terminator in input and will also be emitted as terminator in output.

Consider the tree structure shown in the following figure, where the node a has a caret (^) as its delimiter, and its child nodes b and c have asterisks (*) as their delimiters.

Option Input Output
never c^ c^
allow c^ or c*^ c^
cheer c^ or c*^ c*^
force c*^ c*^

Delimiter Characters (Bytes)

Note - There is essentially no limitation on what characters you can use as delimiters; however, you obviously want to avoid characters that can be confused with data or interfere with escape sequences, as described in Escape Option. The backslash (\) is normally used as an escape character; for example, the HL7 protocol uses a double backslash as part of an escape sequence that provides special text formatting instructions. Additionally, a colon ( :) is used as a literal in system-generated time strings. This can interfere with recovery procedures - for example, following a domain shutdown.

Escape Sequences

Use a backslash (\) to escape special characters. The following table lists the currently supported escape sequences.

Table 6 Escape Sequences

Sequence Description
\ \ Backslash
\b Backspace
\f Linefeed
\n Newline
\r Carriage return
\t Tab
\ddd Octal number
\xdd Hexadecimal number

For octal values, the leading variable d can only be 0 - 3 (inclusive), while the other two can be 0 - 7 (inclusive). The maximum value is \377.

For hexadecimal values, the variable d can be 0 - 9 (inclusive) and A - F (inclusive, either upper or lower case). The maximum value is \xFF.

Multiple Delimiters

You can specify multiple delimiters at a given level; for example, if you specify |, }}, and ^ as delimiters for a specific level, the parser will accept any of these delimiters:

  • root
    • element (delimiters = “|”, “}}”, “^”)
    • field_1 (delimiter = “#”)
    • field_2 (delimiters = “|”, “}}”, “^”)

This will successfully parse the data abc|def, abc~def, and abc^def.

Anchored and Detached Delimiters

Anchored delimiters must be the starting and ending characters of the specified element.

Begin and End Delimiters

Begin delimiters mark the beginning of a fixed-length field, whereas end delimiters mark the end of a field. Usually, the term “delimiter” by itself refers to an end delimiter. We use the term “end delimiters” for clarification when begin delimiters are also present.

Begin delimiters are used to signify the beginning of a fixed-length data field. Since the data field is of fixed length, no delimiter is required to mark the end of the field. Use the Begin Delimiter or Begin Delimiter Detached property to specify it.

Constant and Embedded Delimiters

Constant delimiters remain unchanged at runtime. Embedded delimiters are embedded in the data, and thus are determined dynamically at runtime. Standard embedded delimiters are specified by the Offset and Length delimiter properties, while embedded begin delimiters are specified by the BegOffset and BegLength delimiter properties.

Previous | Next

Return to Designing Custom Encoders Home

Return to GlassFish ESB Documentation Home

JSPWiki v2.4.100
[RSS]
« Home Index Changes Prefs
This page (revision-13) was last changed on 15-Dec-08 15:33 PM, -0800 by NormS