Index Changes

Difference between version and version     

Back to Test, or Test Info


At line 1 changed 2 lines.
Test
aaa<b>bold</b>ccc
<html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=ISO-8859-1">
<title>SNMP BC</title>
</head>
<body>
<h2><a name="mozTocId825701" class="mozTocH2"></a>SNMP Binding Component<br>
one pager</h2>
<p><img src="http://logos.sun.com/try/img/sun_logo.gif" title=""
alt="Sun logo" style="width: 73px; height: 31px;"><br>
</p>
<p><br>
Frank Kieviet / Philip Chan<br>
</p>
<p>Revisions:<br>
</p>
<table cellpadding="2" cellspacing="0" border="1"
style="text-align: left; width: 100%;">
<tbody>
<tr style="font-weight: bold;">
<td style="vertical-align: top;">Who<br>
</td>
<td style="vertical-align: top;">When<br>
</td>
<td style="vertical-align: top;">What<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">Frank Kieviet<br>
</td>
<td style="vertical-align: top;">12-01-06<br>
</td>
<td style="vertical-align: top;">Created<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">Frank Kieviet<br>
</td>
<td style="vertical-align: top;">12-05-06<br>
</td>
<td style="vertical-align: top;">Completed list of architectural
concerns<br>
</td>
</tr>
</tbody>
</table>
<br>
<font size="-1">(C) Copyright 2006 Sun Microsystems</font><br>
<hr style="width: 100%; height: 2px;">
<h3><a name="mozTocId192446" class="mozTocH3"></a>High level
description and applicability</h3>
<p>The SNMP BC can be used to monitor large numbers of SNMP variables
and forward the data for further processing. This further processing is
done in an SE. A description of possible SE's is outside of the scope
of this project, but it is likely that this SE will perform further
slicing and dicing of the data, and store the resulting data in a
relational database. An SE that may be considered in this scenario is
the IEP.<br>
</p>
<p>The SNMP BC is meant to be the data collecting agent in a larger
monitoring application. Other components in this larger monitoring
application may include the aforementioned data analysis SE, but also
business rules engines, visual applications to provide operators with a
graphical view of the health of the system, alerting components, trend
analysis components, historical analysis modules, etc.<br>
</p>
<p>The focus of the SNMP BC in this larger application is on
monitoring. Monitoring SNMP variables is done by both actively querying
of&nbsp; SNMP variables and by passively receiving SNMP <span
style="font-family: monospace;">traps</span> and <span
style="font-family: monospace;">informs</span>. Active querying is
done by scheduled polling of specified SNMP variables.<br>
</p>
<p>Before propagation to the SE, data is led through a primary filter
to remove redundant data, e.g. a poll of a boolean variable will only
propagate <span style="font-style: italic;">changes</span> in the
monitored value. The primary filter is limited in functionality and is
not meant as a place to introduce elaborate business logic: the SE is
the place for that.<br>
</p>
<p>What the SNMP BC will not do: the SNMP BC does not support an
external interface that can be used to set SNMP variables or poll
variables on demand (negotiable). No other protocols than UDP are
supported. <br>
</p>
<h3><a name="mozTocId872530" class="mozTocH3"></a>Requirements<br>
</h3>
<ol>
<li>Should actively monitor variables up to 100,000 agents using SNMP
<span style="font-family: monospace;">GET</span><br>
</li>
<li>Receive SNMP <span style="font-family: monospace;">TRAP</span>, <span
style="font-family: monospace;">INFORM</span></li>
<li>Support SNMPv1, SNMPv2c, SNMPv3</li>
<li>dynamic configuration: addition / changes in the configuration of
what to monitor without redeploying<br>
</li>
<li>provisions for distributed deployment, fail over, redundancy<br>
</li>
</ol>
<h3><a name="mozTocId214450" class="mozTocH3"></a>A summary of SNMP</h3>
<p>This description of SNMP is confined to those aspects of SNMP that
are relevant for the design decisions in this document..<br>
</p>
<p>SNMP is a protocol for network management. SNMP is primarily a
request / reply protocol (<span style="font-family: monospace;">GET </span>and
<span style="font-family: monospace;">SET</span>). <span
style="font-family: monospace;">Agents</span> is a
term used for devices that listen for these <span
style="font-family: monospace;">GET</span> / <span
style="font-family: monospace;">SET</span> requests.
<span style="font-style: italic;">Managers</span> are applications that
send these requests to agents. Next to
<span style="font-family: monospace;">GET</span> / <span
style="font-family: monospace;">SET</span>, SNMP also supports a way
for agents to send unsolicited
notifications to the manager (<span style="font-family: monospace;">TRAP</span>
/ <span style="font-family: monospace;">INFORM</span>).</p>
<p>SNMP can support multiple networking protocols, but the most popular
one is UDP. SNMP over UDP is not a reliable protocol: packets may be
dropped. Each <span style="font-family: monospace;">GET</span>, <span
style="font-family: monospace;">SET</span>, <span
style="font-family: monospace;">TRAP</span> or <span
style="font-family: monospace;">INFORM</span> request or message are
transmitted in one UDP packet. Each packet contain only one data item.
UDP packets are of limited size (&lt; 500 bytes for some network
infrastructures). Data items are limited to a handful of primitive
values, essentially just ints and strings. Data is encoded using BER.<br>
</p>
<p>The data items that can be queries from or set on agents are
organized in a tree. This tree is comparable to a database schema. The
schema language used to describe this tree is a subset of ASN.1 called
the SMI. The nodes in the tree are addressable using object ids (OIDs).
OIDs are globally unique. OIDs are a sequence of period delimited
numbers. The tree and its OIDs that an agent support is called a MIB.
Although there have been efforts to standardize MIBs, there are more
than 2300 cataloged MIBs with over 1,000,000 OIDs.<br>
</p>
<p>Although the data is organized in a tree, MIBs can describe tables
of data. This is done in a complicated way. Each cell in the table ends
up with its own OID. Cells can be read only one at at time through <span
style="font-family: monospace;">GET</span> requests.<br>
</p>
<p>MIBs typically refer to many data items, e.g. querying the MIB for
the Java VM results in more than 400 data items. Much of this data is
static configuration information; there are only a handful data items
that contain information important for management, e.g. memory size,
and thread count.<br>
</p>
<h3><a name="mozTocId384277" class="mozTocH3"></a>Basic entities<br>
</h3>
<p><span style="font-weight: bold;">Unit of monitoring</span> -- The
unit of monitoring can be thought to be a complete MIB or individual
variables (OIDs) in a MIB. Considering the fact that the number of data
items in a MIB that are relevant for monitoring is a small portion (in
the case of the Java VM MIB this is about 1%), it makes more sense to
use individual OIDs as the unit of monitoring than it is to use a
complete MIB as the unit.<br>
</p>
<p><span style="font-weight: bold;">The structure of data being
processed</span> -- Monitored variables can be considered as
stand-alone entities, or can be thought of as nodes in the MIB tree.
For the latter it should be considered that MIBs can be converted into
XSDs (either a single small XSD describing all possible MIBs or one XSD
per MIB), and hence the data of a device can be converted into XML.
However, the consumer of the data, i.e. the consuming SE (e.g. the
IEP), will likely prefer scalar or tabular data over complex structures
because of the large number of MIBs involved. This large number of MIBs
is also the reason that the design of the BC will be simplified if the
BC does not need to have detailed knowledge of all MIBs involved.
Therefore the structure of the data (the entities) should be scalar or
tabular data.<br>
</p>
<h3><a name="mozTocId413146" class="mozTocH3"></a>Service level
requirements and concerns</h3>
<p></p>
<p><span style="font-weight: bold;">Performance: reduction of the
number of data
items produced</span> -- Components consuming the data produced by the
BC are likely more interested in events rather than raw data values.
E.g. an <span style="font-style: italic;">uptime</span> variable may
be used to detect if and when a system was restarted rather than
monitoring a monotonously increasing time value. Similarly, a boolean <span
style="font-style: italic;">system down</span> variable is only of
interest if its value changes. Since the number of monitored data items
will be large, it will be useful if there is a facility for primary
filtering of data to reduce the number of useless data items that the
consuming SE receives. It is not the intention to introduce business
rules here: this remains the responsibility of the consuming SE.<br>
</p>
<p><span style="font-weight: bold;">Performance: reduction of poll rate</span>
--
Many data items in a MIB do not change after a device is booted. An
example is a dump of the system properties of the Java VM MIB. Yet, an
SE may store these variables in a relational database once. If there is
a mechanism to relate one event to another, the number of useless data
item polls can be reduced. For example, the system properties of the
Java VM should only be polled if a system restart is detected. Again,
the intention is not to introduce business rules : that remains the
responsibility of the consuming SE.<br>
</p>
<p><span style="font-weight: bold;">Configuration</span> -- usually BCs
are configured through WSDL extensibility elements. However because of
the large number of OIDs to be configured, and because of the
requirement of dynamic configuration, this is not a feasible approach.
Hence, the configuration should reside in a data store that can be
manipulated independent from the deployment, e.g. a relational
database. Note that configuration is likely done by personnel other
than "comp app designers" or "deployers" and may need to be done
through remote consoles (e.g. a web browser). Also note that because of
the large number of monitored devices, configuration likely occurs
often (e.g. once or multiple times per day).<br>
</p>
<p><span style="font-weight: bold;">Scalability, reliability,
availability</span> --
for reasons of scalability, it should be possible to run the SNMP on
several machines at the same time (horizontal scalability). These
multiple instances should use
the same configuration store, yet they should not poll the same data
elements. For automatic fail over, different instances should be aware
of other instances going down and should be able to take over from
these failed instances automatically.<br>
The SNMP BC should be able to leverage multi core and hyper threaded
CPUs (vertical scaling) through a multi threaded design.<br>
</p>
<p><span style="font-weight: bold;">Extensibility, </span><span
style="font-weight: bold;">maintainability</span>, <span
style="font-weight: bold;">testability</span> -- The SNMP BC is
organized in a number of separate libraries that can be fully tested
outside of JBI. Details on the division in separate libraries is yet to
follow.<br>
</p>
<p><span style="font-weight: bold;">Manageability</span> -- Through JMX
information can be obtained about the internal state of the SNMP BC,
e.g. queue sizes, thread pool sizes, etc. Performance counters etc. can
be reset through JMX. Performance related configuration parameters are
set in the WSDL extensibility elements and will not be changeable
through JMX.<br>
</p>
<p><span style="font-weight: bold;">Security</span> -- The safe storage
of credentials in the configuration store is a special concern. File
system and/or database security will serve as the primary mechanism to
safeguard this information. A secondary mechanism through obfuscation
(encryption with a hard-coded key) can also be used in addition.<br>
</p>
<p><span style="font-weight: bold;">Portability</span> --The SNMP BC
depends on JDK 5. It&nbsp; may rely on classes in Glassfish and the Sun
JDK (negotiable). There will be no dependence on operating system or
hardware platform. </p>
<h3><a name="mozTocId279347" class="mozTocH3"></a>Data collection</h3>
<p><span style="font-weight: bold;">Event mechanism</span> -- An event
mechanism, i.e. event listeners and event generators, will be used as
the conceptual model for the monitoring of devices. Variable monitors
act like event listeners: they are triggered by events. When a variable
monitor is triggered by an event, it polls the monitored variable.
Event sources include timers and triggers. Variable monitors can also
act as event generators: for example a change in a variable <span
style="font-style: italic;">boot time</span> may cause an event that
will invoke other variable monitors.<br>
</p>
<p><span style="font-weight: bold;">Variable monitors</span> -- each
monitored variable will have a variable monitor which is responsible
for invoking the <span style="font-family: monospace;">GET</span>
operation. Variable monitors are invoked by events, have primary
filters which may cause other events to be thrown, and may produce
output that is sent to the SE. Note that variables may be scalar or
tabular.<br>
</p>
<p><span style="font-weight: bold;">Trap monitors</span> -- a trap
monitor is invoked by an SNMP <span style="font-family: monospace;">trap</span>
or <span style="font-family: monospace;">inform</span>. Similarly to
variable monitors, it invokes primary filters which in turn may cause
events, and may produce output that is sent to the SE.<br>
</p>
<p><span style="font-weight: bold;">Timer events</span> -- Schedules
express a series of points in time at which a timer event should be
caused. E.g. every 5 seconds between 09:00 and 18:00 from Monday till
Friday.<br>
</p>
<p><span style="font-weight: bold;">Primary filters</span> -- variable
monitors can have one or more primary filters. If there are multiple
filters, these filters are not put in series but are put in parallel
and run independently from eachother.
Here are a few examples of primary filters:<br>
</p>
<ul>
<li>state change</li>
<ul>
<li>filters out all values that are equal to the previously
monitored value if there is a previously monitored value<br>
</li>
<li>optional: event to generate on state change</li>
<li>output: new value, flag if there was a previously monitored
value</li>
</ul>
<li>monotone</li>
<ul>
<li>filters out all values that are greater than the previously
monitored value if there is a previously monitored value</li>
<li>optional: event to generate when a value is not filtered out<br>
</li>
<li>output: new value, flag if there was a previously monitored
value</li>
</ul>
<li>band</li>
<ul>
<li>filters out all values that are within the specified interval</li>
<li>optional: event to generate when a value is not filtered out<br>
</li>
<li>output: value</li>
</ul>
<li>reachable</li>
<ul>
<li>filters out all values, but produces a value if the variable
could not be polled</li>
<li>optional: event to generate when a value is not filtered out<br>
</li>
<li>output: id</li>
</ul>
<li>batching limit filter</li>
<ul>
<li>limits the number of traps: drops or batches traps if their
frequency exceed a configured value</li>
<li>optional: event to generate when a value is not filtered out<br>
</li>
<li>output: last or aggregated value<br>
</li>
</ul>
</ul>
Note that filters are stateful: e.g. they can retain their last
monitored value. This memory is not persistent, and is not shared
between multiple instances of the SNMP BC.<br>
<p><span style="font-weight: bold;">Buffering</span> -- traps and data
monitoring may produce data at a rate higher than the consuming SE can
process. For this the BC has a limited buffer. If this buffer
overflows, data will be dropped from the buffer. Data with a lower
priority will be dropped from the buffer before data with a higher
priority. Dropping data from the buffer is highly undesirable, and
special measures will be taken so that this situation may only occur if
there is a flood of traps. Should data be dropped, this will be logged
and alerts will be raised.<br>
</p>
<p>To reduce the likelihood of dropped data, most operations are
buffered and some type of operations take precedence over other types
of operations:<br>
</p>
<ol>
<li>delivery to NMR (buffered), reading trap events, reading get
replies</li>
<li>invoking primary filters for trap monitors<br>
</li>
<li>event triggers due to trap monitor primary filters<br>
</li>
<li>event triggers due to variable monitor primary filters<br>
</li>
<li>timer event triggers (delayed)</li>
</ol>
<span style="font-weight: bold;">Reliability</span> -- the UDP protocol
is not reliable. Replies to <span style="font-family: monospace;">get</span>
requests may be lost. The SNMP BC will monitor this and retry the <span
style="font-family: monospace;">get</span> request after timeout.<br>
<br>
The SNMP BC will not persist any data. Data may be lost in the case of
a system crash and in the case of a trap flood as described above.<br>
<br>
<span style="font-weight: bold;">Groups</span> -- variable monitors can
be organized in groups. OIDs with the same group are guaranteed to be
running on the same SNMP BC instance. This guarantee is necessary for
events: events do not span multiple SNMP BC instances. If no group ID
is specified, a group id is generated based on the agent id.<br>
<br>
An instance of an SNMP BC can take
responsibility of multiple groups. In this way multiple instances of
the SNMP BC can be active at the same time without multiple instances
of the same variable monitor being active concurrently.<br>
TBD:<br>
<ul>
<li>how instances of an SNMP BC are mapped to groups (can be done by
partioning baesd on hash(groupid)%nodeid<br>
</li>
<li>how responsibility of groups are transferred to other instances
of the SNMP BC if an instance of the SNMP BC goes down</li>
</ul>
<p><span style="font-weight: bold;">Configuration store</span> -- the
configuration data store will have at least the following entities:<br>
</p>
<ul>
<li>Variable monitor</li>
<ul>
<li>agent id</li>
<li>monitored variable (the OID to query) (TBD: should there be a
mapping from OIDs to human readable strings?)<br>
</li>
<li>a unique reference ID that can be used to identify values in
other parts of the monitoring application</li>
<li>list of event ids to be invoked upon</li>
<li>name of a primary filter and parameters to this primary filter;
these parameters may include eventid-s to generate</li>
<li>priority: used to drop data in case of buffer overflows</li>
<li>timeout<br>
</li>
</ul>
<li>Trap monitor</li>
<ul>
<li>agent id</li>
<li>monitored variable (the OID to query)</li>
<li>a unique reference ID that can be used to identify values in
other parts of the monitoring application</li>
<li>name of a primary filter and parameters to this primary filter;
these parameters may include eventid-s to generate</li>
<li>TBD</li>
<li>priority: used to drop data in case of buffer overflows<br>
</li>
</ul>
<li>Agent id</li>
<ul>
<li>a network address</li>
<li>a port number</li>
<li>credentials</li>
<li>SNMP version</li>
<li>group ID</li>
</ul>
<li>Schedule</li>
<ul>
<li>a schedule expressed in poll frequency, time of day, etc</li>
<li>eventid to generate</li>
</ul>
</ul>
<span style="font-weight: bold;">BC Configuration</span> -- the primary
configuration of the SNMP BC is done through WSDL extensibility
elements. The configuration items are:<br>
<ul>
<li>location of configuration data store</li>
<li>port to listen on</li>
<li>buffer limit</li>
<li>throttle limit<br>
</li>
</ul>
<h3><a name="mozTocId568414" class="mozTocH3"></a>Data propagation</h3>
<p><span style="font-weight: bold;">XSD</span> -- Variable monitors and
trap monitors may generate data that is sent to an SE. The schema of
this data is TBD.<br>
</p>
<p><span style="font-weight: bold;">Message exchange pattern</span> --
Considering the lack of reliability, <span style="font-style: italic;">in-only</span>
is preferred (TBD).<br>
</p>
<h3><a name="mozTocId204730" class="mozTocH3"></a>Configuration tool</h3>
<p>To populate the configuration store a separate tool may be required.
This tool may need to be able to load MIBs and allow the user to select
OIDs from the MIBs. This tool will be described in more detail later.<br>
</p>
<br>
</body>
</html>

JSPWiki v2.4.100
[RSS]
« Home Index Changes Prefs
This page (revision-16) was last changed on 04-Mar-10 15:28 PM, -0800 by FrankKieviet