Index Changes

Difference between version and version     

Back to Designing Custom Encoders, or Designing Custom Encoders Info


At line 3 changed 1 line.
The following topics provide preliminary instructions on how to set up Custom Encoders in JBI projects. This material is scheduled to be enhanced by 12/19/2008. If you have any questions or problems, see the Java CAPS web site at [http://goldstar.stc.com/support].
The following topics provide basic information on how to set up Custom Encoders in JBI projects.
At line 5 removed 1 line.
At line 7 changed 2 lines.
* [Understanding the Encoder Framework|http://wiki.open-esb.java.net/Wiki.jsp?page=EncoderFramework]
* [About Data Parsing and Serialization|http://wiki.open-esb.java.net/Wiki.jsp?page=ParsingAndSerialization]
The following links provide you with information about Custom Encoders and how they work.
At line 8 added 3 lines.
* [Understanding the Encoder Framework|EncoderFramework]
* [About Data Parsing and Serialization|ParsingAndSerialization]
At line 11 changed 7 lines.
* [Creating the Abstract Message Definition|http://wiki.open-esb.java.net/Wiki.jsp?page=AbstractMessageDefinition]
* [Editing Encoding Properties|http://wiki.open-esb.java.net/Wiki.jsp?page=EncodingProperties]
* [Matching Data Patterns|http://wiki.open-esb.java.net/Wiki.jsp?page=DataPatterns]
* [Specifying Delimiters|http://wiki.open-esb.java.net/Wiki.jsp?page=Delimiters]
* [Defining a Delimiter List|http://wiki.open-esb.java.net/Wiki.jsp?page=DelimiterList]
* [Validating and Testing the Custom Message Definition|http://wiki.open-esb.java.net/Wiki.jsp?page=ValidateMessageDefinition]
* [Using Custom Encoders in JBI Projects|http://wiki.open-esb.java.net/Wiki.jsp?page=CustomEncodersInJBI]
The following links lead you through the details of configuring and using Custom Encoders.
At line 14 added 7 lines.
* [Applying Custom Encoding to an XSD|AbstractMessageDefinition]
* [Editing Encoding Properties|EncodingProperties]
* [Matching Data Patterns|DataPatterns]
* [Specifying Delimiters|Delimiters]
* [Defining a Delimiter List|DelimiterList]
* [Validating and Testing the Custom Message Definition|ValidateMessageDefinition]
* [Using Custom Encoders in JBI Projects|CustomEncodersInJBI]
At line 21 removed 613 lines.
!!!Understanding the Encoder Framework
An __Encoder__ is a bidirectional software component that transforms an XML message into a non-XML message, and vice versa. The term __encoding__ has a very specific meaning within this context, representing act of transforming an XML message into a non-XML message. The act of transforming a non-XML message into an XML message is termed __decoding__. Despite its name, the Encoder performs both functions.
XML is used as a common data format for processing within GlassFish ESB. In general, most data used in external applications is in some non-XML, serialized format; hence, the need for an Encoder.
A very highly simplified illustration of the data flow to and from GlassFish ESB is shown in the following diagram. The area to the right of the JBI boundary represents GlassFish ESB, while the area to the left of the boundary represents whatever external applications are communicating with GlassFish ESB.
[{Image src='Encoder_Hi-level.gif' width='' height='' align='left|center|right' }]
Three sets of information define the runtime behavior of an Encoder:
* __Encoder Type__, also known as encoding style, defines the high-level encoding rules for a specific type of encoding and applies globally to all encoders of that type. The specific type of encoding relates to the data format used by the external application or communications protocol that is sending data to, or receiving data from, GlassFish ESB. Examples include SAP, Oracle DBMS, HL7, SWIFT, and X12. Encoding rules include:\\
** A grammar to scan an input message in its external representation, and rules on mapping the result to the internal representation (an operation known as __decoding__ or __parsing__).\\
** Rules on generating the external representation of an output message from the internal representation (an operation called __encoding__ or __serialization__).\\
* __Detailed Encoding Rules__ are specific to a single instance of an Encoder Type. These rules include:\\
** Delimiters\\
** Field Lengths\\
** Data offsets\\
* The __Abstract Message Structure__ specifies the logical structure of the messages being processed. This metadata is represented as XML schema (XSD), and may be viewed and edited by an XSD viewer/editor.\\
!!Abstract Message Structure
The runtime message structure is composed of a hierarchical system of __nodes__. These nodes are characterized by terms indicating their relationships with each other:
!Parent, Child, and Sibling Nodes
Any subnode of a given node is called a __child__ node, and the given node, in turn, is the child’s __parent__. __Sibling__ nodes are nodes on the same hierarchical level under the same parent node. Nodes higher than a given node in the same lineage are __ancestors__ and those below it are __descendants__.
__Figure 1-1 Encoder Node Relationships__
[{Image src='Encoder_Node_Relationships.gif' width='' height='' align='left|center|right' }]
!Root and Leaf Nodes
The __root__ node is the highest node in the tree structure, and has no parent. This node is a global element and represents the entire message. It may have one or more child nodes, but can never have sibling nodes or be repeating. The name of the root node can be edited.
__Leaf__ nodes have no children, and normally carry the actual data from the message. They are of simple types such as string.
!Non-leaf Nodes
__Non-leaf__ nodes, which can have children, provide the framework through which this data is accessed and organized. They are of complex types.
There are two major types of non-leaf nodes (aside from a root node, which is a special case):
* __Sequence group__ nodes, which provide organizational grouping for purposes such as repetition. In XSD, they are of complexType of a sequence of elements.\\
* __Choice group__ nodes, which represent sets of alternatives— only one of which is valid at any given time for an instance of that node. For example, a choice node named {{order}} might have two children, respectively named {{domestic}} and {{overseas}}. For each order instance, only one of these children will be present. In XSD, they are of complexType of a choice group of elements.\\
!!Data Types
The basic node types are __fixedLength__ and __delimited__. See Encoding Properties for information about other node types.
* With fixedLength data, the length of the unit of data is always the same. The position of the data within the message string is described by __byte offset__ and __length__.\\
* With delimited data, the length of the unit of data is variable. Information is separated by a pre-determined system of delimiters defined within the properties of the Encoder (see Specifying Delimiters).\\
!!!Creating the Abstract Message Definition
In the absence of a predefined representation of the metadata describing the data format, you must manually create the Abstract Message Definition and apply the Custom Encoder. To begin this process, you must create an XSD and apply the Custom Encoder, as follows:
!To Apply the Custom Encoder to an XSD
1 In your project, right—click to access the project context menu and add a new XML Schema.
[{Image src='Project_CMenu_NewXSD.gif' width='' height='' align='left|center|right' }]
You will need to develop the XSD node structure to match the parsing of the serialized message stream being processed. This process is described in the topics following this one.
2 In the resulting XSD, right—click to access its context menu and select Encoder > Apply Custom Encoder.
[{Image src='XSD_CMenu_Apply_CE.gif' width='' height='' align='left|center|right' }]
3 Once the Encoder has been applied, a special {{encoding}} node will automatically be added as a child node of an {{annotation}} node.
[{Image src='Encoding_Node.gif' width='' height='' align='left|center|right' }]
4 By right-clicking the {{encoding}} node and selecting __Properties__, you can edit the encoding rules for the individual elements.
[{Image src='Encoding_Node_Props.gif' width='' height='' align='left|center|right' }]
5 After applying the Encoder, the context menu changes as shown in the following illustration. Reapplying the Encoder resets the parameters for all nodes to their default values. The node structure you have created will be preserved.
[{Image src='XSD_CMenu_Reapply_CE.gif' width='' height='' align='left|center|right' }]
!!!Editing Encoding Properties
Once the encoding style is applied, you can edit detailed encoding rules at the node level using the special {{encoding}} node under the element's {{annotation}} node.
!!Encoding Properties
The following figure shows the majority of encoding properties associated with various nodes.
__Figure 1-2 Encoding Properties (Root Node/Fixed Length/Encoded)__
[{Image src='Enc_Props_Root_FL_Enc.gif' width='' height='' align='left|center|right' }]
__Table 1-1 Encoding Properties__
|| Name|| Description
| Encoding Style| Specifies the encoding style, for example: {{customencoder-[version]}}.
| Node Type| Specifies the format for parsing and serialization. The options are:\\- {{group}}, which provides organizational grouping for purposes such as repetition. Does not apply to Choice Element or Field nodes.\\- {{array}}, which is a delimited structure. If repeated, occurrences are separated by the {{repeat}} delimiter. The last occurrence may be terminated by a {{normal}} delimiter. Does not apply to Choice Element nodes.\\- {{delimited}}, which is a delimited structure. If repeated, occurrences are separated by a {{normal}} delimiter. Does not apply to Choice Element nodes. See Specifying Delimiters for additional information.\\- {{fixedLength}}, which indicates a fixed length and is specified by non-negative integer (or zero to indicate end of parent node data). Does not apply to Choice Element nodes.\\- {{transient}}, which appears only in an internal tree as a scratchpad field. It does not appear in external data representation, and can only have {{transient}} node types as children.\\The default value is {{delimited}}.\\See also Node Type Default Values (following this table) for more information.
| Fixed Length Type| Displayed for {{fixedLength}} Node Type only.The options are:\\- {{regular}} \\- {{encoded}} \\- {{determined by regex match}} \\- {{deducted from end}}\\
| Encoded Field Length| Displayed only for {{fixedLength}} Node Type with the {{encoded}} option. Specifies the length of the field; the default value is {{0}}.
| Encoded Field Offset| Displayed only for {{fixedLength}} Node Type with the {{encoded}} option.
| Encoded Field Position| Displayed only for {{fixedLength}} Node Type with the {{encoded}} option.
| Length From End| Displayed only for {{fixedLength}} Node Type with the {{deducted from end}} option.
| Array Delimiter| Displayed for {{array}} Node Type only.
| Delimiter List| Opens the Delimiter List Editor. See Specifying Delimiters for information.
| Top| Applies to root node only. Specifies whether or not parsing/serializing encoding is supported for descendant nodes. The default value is {{true}} (checked box).
| Input Charset| Specifies the character set of the input data. This is only needed if the parsing is done upon byte array data and the character set that the byte array data is encoded against is not safe for delimiter scanning. If this property is not specified, the value specified for the {{Parsing Charset}} property will be used. This property is displayed only when the {{Top}} property is set to {{true}} (checked box). Applies to root node only. See Data Encoding for additional information.
| Output Charset| Specifies the character set if it needs to be different from the serializing character set. If this property is not specified, the value specified for the {{Serializing Charset}} property will be used. This property is displayed only when the {{Top}} property is set to {{true}} (checked box). Applies to root node only. SeeData Encoding for additional information. Note: his character set may be unsafe for delimiter scanning.
| Parsing Charset| Specifies the character set used to decode byte array data into string during parsing. It is recommended to use UTF-8 for DBCS data, since the hex value of some ASCII delimiter may coincide with a hex value contained within a double-byte character. This property is displayed only when the {{Top}} property is set to {{true}} (checked box). See Data Encoding for additional information.
| Serializing Charset| Specifies the character set used to encode string data into byte array data during serialization of the data. This property is displayed only when the {{Top}} property is set to {{true}} (checked box). See Data Encoding for additional information.
| Order| Specifies the ordering of the selected node’s children during the parsing process. See Order Property (following this table) for additional information.\\- {{sequence}} specifies that the child nodes must appear in the sequence given in the metadata.\\- {{any}} specifies that the child nodes must remain grouped, but the groups can appear in any order.\\- {{mixed}} specifies that the child nodes can appear in any order\\Note: Does not apply to __choice element__ nodes.
| NofN minN| Specifies the minimum number of child nodes that should appear.
| NofN maxN| Specifies the maximum number of child nodes that should appear.
| MinOcc| Applies to repeating nodes only. Specifies the minimum number of nodes that should appear.
| MaxOcc| Applies to repeating nodes only. Specifies the maximum number of nodes that should appear.
| Scavenger Chars| Specifies the characters to be stripped out when parsing the data, if they appear at the start of the byte stream for this element.
| Output Scavenger 1st Char| Specifies the character to be stripped out when serializing the data, if it appears as the first character of the output byte stream from this element.
| Array Delimiter| Displayed for {{array}} Node Type only.
| Begin Delimiter| Once beginning delimiters are specified, the value field displays the delimiter characters.
| Begin Delimiter Detached| Specifies whether the Begin Delimiter is anchored or detached. The default value is {{false}} (unchecked box), indicating an anchored delimiter.
| Delimiter| Displayed for {{delim}} Node Type only.Once delimiters are specified, the value field displays the delimiter characters. Does not apply to __choice element__ nodes.
| Escape Sequence| Displayed only when the {{top}} property is set to {{true}} (checked box).
| Fine Inherit| Displayed only when the {{top}} property is set to {{true}} (checked box). Enables the inheritance of the following delimiters to be inherited individually from the parent nodes:\\- begin\\- end\\- repeating\\The default value is {{false}} (unchecked box).
| Undefined Data Policy| Displayed only when the {{top}} property is set to {{true}} (checked box). The options are as follows:\\- {{map}} \\- {{ship}} \\- {{prohibit}} \\
!Node Type Default Values
The basic default value for the nodeType property is {{delimited}}. If, however, the node is the child of a parent node whose Node Type is {{fixedLength}} or {{transient}}, then the child takes on the same Node Type as the parent. See the following table for additional information.
%%information
Note - This rule does not apply to Choice Element nodes.\\
%%
__Table 1-2 Node Type Default Values__
|| Parent|| Child
| array| delimited
| delimited| delimited
| fixed| fixed
| group| delimited
| transient| transient
!Order Property
To illustrate how the {{order}} property works, consider the simple tree structure shown in the following diagram, where __a__ is an element node, __b__ is a non-repeating field node, and __c__ is a repeating field node. The value set for the {{order}} property allows the field nodes to appear as shown in following table.
__Figure 1-3 Order Property Example__
[{Image src='Order_Property.gif' width='' height='' align='left|center|right' }]
__Table 1-3 Order Property Example__
|| Value|| Allowed Node Order
| sequence| b, c1, c2
| any| b, c1, c2, __or__ c1, c2, b
| mixed| b, c1, c2, __or__ c1, c2, b, __or__ c1, b, c2
!!Data Encoding
For GlassFish ESB to correctly handle data in byte-oriented protocol, the encoding method for inbound and outbound Encoders and the native code used for parsing must be specified in the Encoding properties. If you do not specify otherwise, UTF-8 is assumed to be the encoding method in each case.
Supporting UTF-8 by default allows the use of the Unicode character set in both ASCII and non-ASCII based environments without further specification. GlassFish ESB also supports ASCII for English, Japanese, and Korean locales, and the localized country-specific encoding methods shown in the following table.
The data encoding you specify when configuring the Encoding properties modifies the Java methods used for encoding and decoding. The encoding and decoding processes differ from one another depending upon which Java method you use, and whether you are encoding to or decoding from bytes or strings. The diagrams shown in About Data Parsing and Serialization illustrate these differences.
The encoding options available to you depend on the locale specified by your version of GlassFish ESB. UTF-8 is the default in all locales.
__Table 1-4 Partial Listing of Supported Encoding Options According to Locale__
|| English|| Japanese|| Korean|| Simplified Chinese|| Traditional Chinese
| UTF-8| UTF-8| UTF-8| UTF-8| UTF-8
| ASCII| ASCII| ASCII| GB2312| Big5
| EBCDIC| EUC-JP| EUC-KR| |
| UTF-16| SJIS| MS949| |
| | MS932| | |
!!!Matching Data Patterns
One of the parsing techniques that can be applied to the decoding of an input data stream is that of matching a specific byte pattern within a data sequence. You can accomplish this in a Custom Encoder by using the {{Match}} and {{Align}} field-node properties, when the {{Node Type}} is either {{delimited}} or {{fixedLength}}. During the decode operation, a field is successfully matched if it complies with the value of the {{Match}} property, interpreted according to the value of the {{Align}} property, as set for that field.
!!Defining Byte Patterns
The value you enter for the {{Match}} property defines the byte pattern for the data you want to match. As an example, a value of {{abc}} has been entered into the value field shown in the following figure. This provides a reference for the {{Align}} property, as shown in the next section.
Selecting the {{No Match}} check box reverses the situation, resulting in a match if the field contents (data) are __not equal to__ the byte pattern entered in the {{Match}} field.
__Figure 1-4 Match Property__
[{Image src='Match_Property.gif' width='' height='' align='left|center|right' }]
!!Specifying Pattern Alignment
The {{align}} property supplements the {{match}} property, specifying criteria on which to base the match. The default value is {{blind}}; if this is specified, the match property has no meaning.
__Figure 1-5 Align Property Menu__
[{Image src='Field_Enc_Props_Align.gif' width='' height='' align='left|center|right' }]
__Table 1-5 Align Parameter Options__
|| Option|| Description
| blind| Always performs a match (default value). Any value set for the {{Match}} property is ignored.
| exact| When an input byte sequence exactly matches the specified byte pattern (for example, [{{abc}}]), the decode method matches the field to the input byte sequence.
| begin| When the leading bytes of an input byte sequence match the value set for the {{Match}} property (for example, [{{abc......}}]), the decode method matches the field to the input byte sequence.
| final| When the trailing bytes of an input byte sequence match the value set for the {{Match}} property (for example, [{{......abc}}]), the decode method matches the field to the input byte sequence.
| inter| When the input byte sequence contains a byte pattern that includes the value set for the {{Match}} property, (for example, [{{...abc...}}]), the decode method matches the field to the input byte sequence.
| super| When an input byte sequence is a subsequence of the value set for the {{Match}} property (for example, [{{bc}}]), the decode method matches the field to the input byte sequence.
| oneof| If the value set for the {{Match}} property is a repeating pattern of the form {{<separator><value>...}} (for example, [{{\mon\wed\fri}}]), and the input byte sequence contains a byte pattern that matches one of the {{<value>}} entries (for example, [{{wed}}]), the decode method matches the field to the input byte sequence.
| regex| When an input byte sequence exactly matches the regular expression specified in the {{Match}} property, the decode method matches the field to the input byte sequence.
%%information
Note - The value entered for the {{match}} property is interpreted as a {{Latin1}} string, rather than following the specified encoding.\\
%%
!!!Specifying Delimiters
!!Delimiter List
You can define a set of delimiters — a __delimiter list__ — for any node in the hierarchical data structure. This delimiter list is used in the external data representation for that node and its descendents. A delimiter list defined for any non-root node overrides the effect of any ancestor node’s delimiter list on both the node itself and its descendents.
Delimiters are defined using the Delimiter List Editor, as illustrated in the following figure. The editor is invoked by clicking the {{delim}} property value field in the node's property dialog box and clicking the ellipsis (…) button, or by double-clicking the field. See Defining a Delimiter List for additional information.
Clicking within a field in the Delimiter List Editor enables the field for editing. After typing a value into a field, you must press {{Enter}} to set the value. Clicking the drop-down menu button in one of the following three fields displays its menu, as illustrated in the following figure.
* Type\\
* Optional\\
* Terminator\\
__Figure 1-6 Delimiter List Editor: Left Side__
[{Image src='Delim_List_Editor_Left.gif' width='' height='' align='left|center|right' }]
__Figure 1-7 Delimiter List Editor: Right Side__
[{Image src='Delim_List_Editor_Right.gif' width='' height='' align='left|center|right' }]
__Table 1-6 Delimiter List Editor Command Buttons__
|| Command|| Action
| Add Level| Adds a new level after the selected level.
| Add Delimiter| Adds a new delimiter after the selected delimiter, or to the bottom of list under the selected level.
| Remove| Deletes the selected line item (level or delimiter) from the list.
| Remove All| Deletes all items (levels and delimiters) from the list.
| OK| Saves your entries and closes the editor.
| Cancel| Discards your entries and closes the editor.
!!Delimiter Properties
__Table 1-7 Delimiter Properties__
|| Property|| Description
| Level| Assigns consecutive sets of delimiter parameters to delimited nodes in the Encoder node hierarchy. See Delimiter Levels for additional information.
| Type| Specifies how the delimiter is used. See Delimiter Type for additional information.
| Precedence| Indicates the priority of a certain delimiter, relative to other delimiters. See Precedence for additional information.
| Optional| Specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty. See Optional for additional information. Note: Does not apply to children of __choice element__ nodes.
| Terminator| Specifies how delimiters are to be handled for a specific terminator node in the Encoder tree. See Terminator for additional information.
| Bytes| Specifies the characters (bytes) to use as delimiters for the specified level. See Delimiter Characters (Bytes) for additional information. Note: Entering a value for {{Length}} (see below) indicates a fixed length, and clears this field.
| Offset| Offset of the delimited data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is {{0}}.
| Length| Length of the data field in bytes, if it is of fixed length. Value must be positive integer. Entering a value clears the {{Bytes}} field
| Detached| When checked, indicates that the specified delimiter is a detached, or non-anchored, delimiter, and does not have to appear at a fixed position.
| BegBytes| Character (byte) to use as a beginning delimiter for a fixed-length data field.
| BegOffset| Offset of the fixed-length data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is {{0}}.
| BegLength| Length of the data field in bytes, if it is of fixed length and has a beginning delimiter. Value must be positive integer. Entering a value clears the {{Bytes}} field
| BegDetached| When checked, indicates that the specified delimiter is a detached (non-anchored) beginning delimiter, and does not have to appear at a fixed position.
| Skip| When checked, skips identical leading delimiters.
| Collapse| When checked, collapses identical, consecutive delimiters into a single delimiter.
!!Delimiter Levels
Delimiter levels are assigned in order to those hierarchical levels of an Encoder that contain at least one node that is specified as being delimited. If none of the nodes at a particular hierarchical level is delimited, that hierarchical level is skipped in assigning delimiter levels.
Delimiter lists are typically specified on the root node, so that the list applies to the entire Encoder. The root node itself is typically not delimited, so that __Level 1__ would apply to those nodes that are children of the root node. See the following figure and example.
__Figure 1-8 Encoder Hierarchical and Delimiter Levels__
[{Image src='OTD_Hierarchy_DelimLevels.gif' width='' height='' align='left|center|right' }]
For example, if you want to parse the following data:
{{{
a^b|c^d|e
}}}
you might create a Custom Encoder as follows:
* root\\
** element_1\\
*** field_1\\
*** field_2\\
** element_2\\
*** field_3\\
*** field_4\\
** field_5\\
In this example, the delimiter list is specified on the __root__ node, which is not delimited; therefore, the list has two levels:
* Level 1\\
** Delimiter |\\
* Level 2\\
** Delimiter ^\\
The __Level 1__ delimiter (__|__) applies to element_1, element_2, and field_5. The __Level 2__ delimiter (__^__) applies to field_1 - field_4.
If the root node is set to be delimited, the __Level 1__ delimiters will then apply to it. Using the above example, the __Level 2__ delimiter (^) would then apply to element_1, element_2, and field_5, and a new __Level 3__ delimiter would apply to field_1 - field_4.
Delimiter lists can be much more complex than this very simple example. For instance, you can create multiple delimiters of different types at any given level, and you can specify a delimiter list on any node within the Encoder— not only the root node as shown in the example. See Defining a Delimiter List for a step-by-step description of the procedure for creating a Delimiter List.
!!Delimiter Type
The __Delimiter Type__ property specifies whether the delimiter is a terminator at the end of the byte sequence ({{normal}}), a separator between byte sequences in an array ({{repeat}}) or an escape sequence.
__Table 1-8 Delimiter Type Options__
|| Option|| Description
| normal| Terminator.
| repeat| Array separator.
| escape| Escape sequence.
| quot-esc| Quoted escape sequence. Whatever appears within the (double) quotes is escaped.
!Escape Option
An __escape__ delimiter is simply a sequence that is recognized and ignored during parsing. Its purpose is to allow the use of escape sequences to embed byte sequences in data that would otherwise be seen as delimiter occurrences.
For example, if there is a normal delimiter “__+__” at a given level, and we define an escape delimiter “__\+__” as shown in the following figure, then {{aaa+b\+c+ddd}} will parse as three fields: {{aaa}}, {{b\+c}}, and {{ddd}}. If the escape delimiter were not defined, the sequence would then parse as four fields: {{aaa}}, {{b\}}, {{c}}, and {{ddd}}.
__Figure 1-9 Delimiter Type - Escape__
[{Image src='DLE_Escape.gif' width='' height='' align='left|center|right' }]
If there is __only__ an escape delimiter on a given level, however, it presents a __no delimiter defined__ situation for {{delim}} and {{array}} nodes.
!!Precedence
__Precedence__ indicates the priority of a certain delimiter, relative to the other delimiters. Precedence is used to resolve delimiter conflicts when one delimiter is a copy or prefix of another. In case of equal precedence, the innermost prevails.
By default, all delimiters are at precedence 10, which means they are all considered the same; fixed fields are hard-coded at precedence 10. Delimiters on parent nodes are not considered when parsing the child fields; only the child’s delimiter (or if it is a fixed field, its length). The range of valid precedence values is from 1 to 100, inclusive.
Changing the precedence of a delimiter will cause them to be applied to the input data-stream in different ways. For example:
* root\\
** element (type delim, delimiter = “^”, repeat)\\
** field_1 (type fixed, length = 5)\\
** field_2 (type fixed, length = 8, optional)\\
Although this will parse {{”abcde12345678^zyxvuABCDEFGH’}}, it will __not__ parse the text {{”abcde^zyxvuABCDEFGH’}} even though the second fixed field is optional. The reason is that the element’s delimiter is ignored within the fixed field because they have the same precedence. If you want the element’s delimiter to be examined within the fixed field data, you must change its precedence, for example:\\
root\\
** element (type delim, delimiter = “^”, repeat, __precedence = 11__)\\
** field_1 (type fixed, length = 5)\\
** field_2 (type fixed, length = 8, optional)\\
This will successfully parse the text {{”abcde^zyxvuABCDEFGH’}}.\\
A similar argument can be applied to delimited child nodes. The parser normally attempts to match the child delimiter— setting the precedence to 11 forces the parser to match the parent delimiter first.\\
!!Optional
The __Optional__ property specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty.
__Table 1-9 Optional Mode Options__
|| Option|| Rule
| never| Do not allow on input, do not emit on output (empty field between delimiters implies zero length data field).
| allow| Skip empty field if present; if absent, do not delimit on output.
| cheer| Skip empty field if present; if absent, delimit on output.
| force| Require empty, delimited field on input; always delimit on output. Note: Only this option allows trailing delimiters for a sequence of absent optional nodes.
As illustrative examples, consider the tree structures shown in the following figure and table, where the node __a__ has a caret ({{^}}) as its delimiter, and the child nodes __b__, __c__, and __d__ all have asterisks ({{*}}) as their delimiters.
* __Example 1:__ Child node __c__ is __optional__. (Child nodes __c__ and __d__ must have different values for the __match__ parameter.)\\
__Figure 10 Optional Mode Property (Example 1)__
[{Image src='OptionalDelim.gif' width='' height='' align='left|center|right' }]
|| Option|| Input|| Output
| never| __b*d^__| __b*d^__
| allow| __b**d^__| __b*d^__
| cheer| __b**d^__| __b**d^__
| force| __b**d^__| __b**d^__
* __Example 2:__ Child nodes __c__ and __d__ are both __optional__.\\
__Figure 11 Optional Mode Property (Example 2)__
[{Image src='OptionalDelim2.gif' width='' height='' align='left|center|right' }]
|| Option|| Input|| Output
| never| __b^__| __b^__
| allow| __b^__, __b*^__, or __b**^__| __b^__
| cheer| __b^__, __b*^__, or __b**^__| __b**^__
| force| __b**^__| __b**^__
!!Terminator
The __Terminator__ property specifies how delimiters are to be handled for a specific terminator node in the Encoder tree.
__Table 1-10 Terminator Mode Options__
|| Option|| Rule
| never| Do not allow on input, do not emit on output (pure separator).
| allow| Allow on input, do not emit on output.
| cheer| Allow on input, always emit on output.
| force| Require on input, always emit on output (pure terminator).
Consider the tree structure shown in the following figure, where the node __a__ has a caret ({{^}}) as its delimiter, and its child nodes __b__ and __c__ have asterisks ({{*}}) as their delimiters.
__Figure 12 Terminator Mode Property Example__
[{Image src='TermTypeDelim.gif' width='' height='' align='left|center|right' }]
|| Option|| Input|| Output
| never| __c^__| __c^__
| allow| __c^__ or __c*^__| __c^__
| cheer| __c^__ or __c*^__| __c*^__
| force| __c*^__| __c*^__
!!Delimiter Characters (Bytes)
There is essentially no limitation on what characters you can use as delimiters; however, you obviously want to avoid characters that can be confused with data or interfere with escape sequences, as described in Escape Option. The backslash ({{\}}) is normally used as an escape character; for example, the HL7 protocol uses a double backslash as part of an escape sequence that provides special text formatting instructions.
%%information
Note - You should avoid using a colon ( {{:}}) as a delimiter character, since it is used as a literal in system-generated time strings. This can interfere with recovery procedures, for example following a Domain shutdown.\\
%%
!Escape Sequences
Use a backslash ({{\}}) to escape special characters. The following table lists the currently supported escape sequences.
__Table 1-11 Escape Sequences__
|| Sequence|| Description
| \ \| Backslash
| \b| Backspace
| \f| Linefeed
| \n| Newline
| \r| Carriage return
| \t| Tab
| \ddd| Octal number
| \xdd| Hexadecimal number
For octal values, the leading variable {{d}} can only be {{0}} - {{3}} (inclusive), while the other two can be {{0}} - {{7}} (inclusive). The maximum value is {{\377}}.
For hexadecimal values, the variable {{d}} can be {{0}} - {{9}} (inclusive) and {{A}} - {{F}} (inclusive, either upper or lower case). The maximum value is {{\xFF}}.
!!Multiple Delimiters
You can specify multiple delimiters at a given level; for example, if you specify {{|}}, {{~}}, and {{^}} as delimiters for a specific level, the parser will accept any of these delimiters:
* root\\
** element (delimiters = “{{|}}”, “{{~}}”, “{{^}}”)\\
** field_1 (delimiter = “{{#}}”)\\
** field_2 (delimiters = “{{|}}”, “{{~}}”, “{{^}}”)\\
This will successfully parse the data {{abc|def}}, {{abc~def}}, and {{abc^def}}.\\
__Figure 1-13 Multiple Delimiter Example__
[{Image src='DLE_Multiple.gif' width='' height='' align='left|center|right' }]
!!Anchored and Detached Delimiters
Anchored delimiters must be the starting and ending characters of the specified element.
!!Begin and End Delimiters
Begin delimiters mark the beginning of a fixed-length field, whereas end delimiters mark the end of a field. Usually, the term “delimiter” by itself refers to an end delimiter. We use the term “end delimiters” for clarification when begin delimiters are also present.
Begin delimiters are used to signify the beginning of a fixed-length data field. Since the data field is of fixed length, no delimiter is required to mark the end of the field. Use the {{Begin Delimiter}} or {{Begin Delimiter Detached}} property to specify it.
!!Constant and Embedded Delimiters
Constant delimiters remain unchanged at runtime. Embedded delimiters are embedded in the data, and thus are determined dynamically at runtime. Standard embedded delimiters are specified by the {{Offset}} and {{Length}} delimiter properties, while embedded begin delimiters are specified by the {{BegOffset}} and {{BegLength}} delimiter properties.
!!!Defining a Delimiter List
As an example, we shall create a delimiter list for the simple Encoder structure shown in the following figure.
__Figure 1-14 Sample Encoder Tree__
[{Image src='Sample_Schema_Tree.gif' width='' height='' align='left|center|right' }]
!To create a delimiter list
1 In the XSD Editor, select the node for which you want to define a set of delimiters (this example uses the __root__ node, which is designated Element_1). By default, the value for the {{Node Type}} property is set to {{delimited}} and the value for the {{Delimiter List}} property appears as {{not specified}}.
%%information
Note - The {{Node Type}} values for elements and fields also are {{delimited}} by default, so they automatically pick up the delimiters specified for their ancestors unless you define new delimiter lists for them.\\
%%
2 Click the ellipsis (…) button in the {{Delimiter List}} value field to display the Delimiter List Editor, which is initially blank.
3 Click {{Add Level}} to add a level to the delimiter list, then click {{Add Delimiter}} to add a delimiter to the selected level. Click in the {{Bytes}} field to activate it for editing and type in the delimiter characters.
4 Press {{Enter}} to set the delimiter value. The list should appear as shown in the following figure.
__Figure 1-15 Delimiter List Editor - Add Delimiter__
[{Image src='E1_DLE_PipeEntered.gif' width='' height='' align='left|center|right' }]
5 Continue adding levels and delimiters as required, as shown in the following figure.
__Figure 1-16 Delimiter List Editor - Add Levels and Delimiters__
[{Image src='E1_DLE_3LevelsEntered.gif' width='' height='' align='left|center|right' }]
6 Click {{OK}} to close the editor and save your work.
7 The value for the {{Delimiter List}} property will now indicate the number of delimiter levels that are specified, as shown in the following figure.
__Figure 1-17 Element_1 - Delimiters Specified__
[{Image src='E1_Enc_Props_Set.gif' width='' height='' align='left|center|right' }]
8 The properties for Element_2 are displayed in the following figure. It automatically picks up the delimiters for __Level 2__, since the existing delimiter list is defined for Element_1. Defining another delimiter list here would override the existing list.
__Figure 1-18 Element_2 Properties__
[{Image src='E2_Enc_Props_NotSet.gif' width='' height='' align='left|center|right' }]
9 Leave the {{Node Type}} property for Field_1 set to {{delimited}}; it automatically picks up the delimiters for __Level 3__ from the list defined for Element_1, as displayed in the following figure. Again, the {{Delimiter List}} property remains {{not specified}}.
__Figure 1-19 Field_1 Properties__
[{Image src='F1_Enc_Props_NotSet.gif' width='' height='' align='left|center|right' }]
10 Once you have defined your delimiter list, you should test the Encoder to verify that it parses correctly.
!!!Validating and Testing the Custom Message Definition
!!Validating the Custom Message Definition
You can validate the encoding rules, along with the message definition in XML format, by clicking the validation button in the XSD Editor. If encoding rules are present, they are validated following validation of the XML grammar and semantics. An example output showing multiple errors is shown in the following figure.
__Figure 1-20 Example Validation Result__
[{Image src='Validation_Example.gif' width='' height='' align='left|center|right' }]
!!Testing the Encoder Runtime Behavior
The Encoder Tester allows you to test the Encoder's runtime behavior at design time. To display the tester dialog, right-click the XSD file to display its context menu and select {{Encoder > Test}}, as shown in the following figure.
__Figure 1-21 Starting the Encoder Tester__
[{Image src='XSD_CMenu_EncTest.gif' width='' height='' align='left|center|right' }]
The Test Encoding dialog is shown in the following figure. The various fields are described briefly in the table following the figure. After the Decode test is complete, the result is placed in an XML file inside the current project. This file can then be validated as described in the preceding section. There is no automatic method for validating the Encode result, however.
__Figure 1-22 Test Encoding Dialog__
[{Image src='Test_Encoding_Dialog.gif' width='' height='' align='left|center|right' }]
__Table 1-12 Test Encoding Dialog Fields__
|| Section|| Field Caption|| Description
| Meta| Select an Element| Specifies the top-level element whose structure you want to test.
| Meta|XSD File| Identifies the XSD file you have selected for testing.
| Input| Decode/Encode| Option buttons to select the direction of data flow for the test. Specifies whether encoding or decoding behavior is being tested.
| Input|From/To String| Specifies that the input or output data is in string format. If not checked, byte format is assumed.
| Input|Data File| Specifies the data file to use in the Decode test.
| Input|XML Source File| Specifies the source file to use in the Encode test.
| Input|Source/Result Coding| Specifies the encoding of the serialized data. See Data Encoding
| Output| File Name| Specifies the file name to use for the test result.
| Output|Folder| Specifies the folder in which you want the output file to be placed.
| Output|Overwrite Output| Specifies whether or not you want to overwrite any existing output file having the same name.
| Output|Created File| Confirms that the output file has been created, along with the location.
| Debug| Verbose Level| Specifies the level of detail contained in the log file. The options are:\\- {{None}}\\- {{Info}}\\- {{Fine}}\\- {{Finer}}\\- {{Finest}}\\
!!!Using Custom Encoders in JBI Projects
Using a Custom Encoder in a JBI Project is described in the following procedure.
!To Use a Custom Encoder in a JBI Project
# Import the XSD into a WSDL using the WSDL context menu.
# Configure the individual binding component's inbound or outbound message type as {{encoded}} and set the encoding style to {{customencoder-1.0}}. See the following figure as an example.
__Figure 1-23 File Message Property Configuration__
[{Image src='File_Msg_Props.gif' width='' height='' align='left|center|right' }]
!!!About Data Parsing and Serialization
The parsing and serializing operations require data to be in byte-array form, so different methods for encoding and decoding data must be used to accommodate different input and output data formats. These different methods incorporate various stages of character conversion using specific character sets.
!!Encoding Process
Internally, the encoder requires the data input and output to be in bytes. The encoding process uses the {{serializing charset}}, as illustrated in the following figure.
__Figure 1-24 Encoding Process__
[{Image src='Encoder_Coding_marshal.gif' width='' height='' align='left|center|right' }]
!!encodeToString() Method
The {{encodeToString()}} method requires conversion to produce an output string after encoding from a byte[] field. This method also requires conversion when encoding from a string field, since the parser requires the data in bytes, and conversion again to produce an output string. The {{encodeToString()}} process uses the {{serializing charset}}, as illustrated in the following figure.
__Figure 1-25 encodeToString()__
[{Image src='Encoder_Coding_marshalToString.gif' width='' height='' align='left|center|right' }]
!!encodeToBytes() Method
The {{encodeToBytes()}} method requires conversion to produce bytes after encoding from a string field. Following serialization, this method also requires conversion to produce an output (in bytes) having a different format from that used by the parser. If the same format is desired, then the {{output charset}} is left undefined, the {{serializing charset}} property is substituted by default, and the double conversion is bypassed. The {{encodeToBytes()}} process uses both the {{serializing charset}} and the {{output charset}}, as illustrated in the following figure.
__Figure 1-26 encodeToBytes()__
[{Image src='Encoder_Coding_marshalToBytes.gif' width='' height='' align='left|center|right' }]
!!encodeToStream() Method
Encodes an XML representation of a message into an OutputStream object, encoded in custom format.
!!encodeToWriter() Method
Encodes an XML representation of a message into a Writer object, encoded in custom format.
!!Decoding Process
Internally, the decoding process requires conversion when decoding to a string field, since the input is in bytes as required by the parser. The decoding process uses the {{parsing charset}}, as illustrated in the following figure.
__Figure 1-27 Decoding Process__
[{Image src='Encoder_Coding_unmarshal.gif' width='' height='' align='left|center|right' }]
!!decodeFromString() Method
The {{decodeFromString()}} method requires conversion of the input string, since the parser requires the data in bytes. This method requires a second conversion when decoding to a string field. The {{decodeFromString()}} process uses the {{parsing charset}}, as illustrated in the following figure.
__Figure 1-28 decodeFromString()__
[{Image src='Encoder_Coding_unmarshalFromString.gif' width='' height='' align='left|center|right' }]
!!decodeFromBytes() Method
The {{decodeFromBytes()}} method requires conversion if the input data has a different byte format from that used by the parser. If the same format is desired, then the {{input charset}} is left undefined, the {{parsing charset}} is substituted by default, and the double conversion is bypassed. After parsing, this method requires further conversion if decoding to a string field. The {{decodeFromBytes()}} process uses both the {{input charset}} and the {{parsing charset}}, as illustrated in the following figure.
__Figure 1-29 decodeFromBytes()__
[{Image src='Encoder_Coding_unmarshalFromBytes.gif' width='' height='' align='left|center|right' }]
!!decodeFromStream() Method
Decodes an InputStream object encoded in custom format into an XML-encoded message.
!!decodeFromReader() Method
Decodes a Reader object encoded in custom format into an XML-encoded message.
!!Setting Delimiters
The following figure illustrates how the delimiter gets set and passed into the parser.
__Figure 30 Setting Delimiters__
[{Image src='DelimiterToParser_FlowGFESB.gif' width='' height='' align='left|center|right' }]
As an example, if you select a delimiter in the XSD Editor by hex code (such as __\x7C__), it is passed directly into the parser. If you type the delimiter in as a pipe (__|__), however, then the pipe character is first converted to hex code, using the GUI’s encoding, and then sent to the parser.
[Previous|P1]

JSPWiki v2.4.100
[RSS]
« Home Index Changes Prefs
This page (revision-28) was last changed on 09-Jun-09 16:43 PM, -0700 by rjacobus