
Frank Kieviet / Philip Chan
Revisions:
| Who |
When |
What |
| Frank Kieviet |
12-01-06 |
Created |
| Frank Kieviet |
12-05-06 |
Completed list of architectural
concerns |
The SNMP BC can be used to monitor large numbers of SNMP variables
and forward the data for further processing. This further processing is
done in an SE. A description of possible SE's is outside of the scope
of this project, but it is likely that this SE will perform further
slicing and dicing of the data, and store the resulting data in a
relational database. An SE that may be considered in this scenario is
the IEP.
The SNMP BC is meant to be the data collecting agent in a larger
monitoring application. Other components in this larger monitoring
application may include the aforementioned data analysis SE, but also
business rules engines, visual applications to provide operators with a
graphical view of the health of the system, alerting components, trend
analysis components, historical analysis modules, etc.
The focus of the SNMP BC in this larger application is on
monitoring. Monitoring SNMP variables is done by both actively querying
of SNMP variables and by passively receiving SNMP traps and informs. Active querying is
done by scheduled polling of specified SNMP variables.
Before propagation to the SE, data is led through a primary filter
to remove redundant data, e.g. a poll of a boolean variable will only
propagate changes in the
monitored value. The primary filter is limited in functionality and is
not meant as a place to introduce elaborate business logic: the SE is
the place for that.
What the SNMP BC will not do: the SNMP BC does not support an
external interface that can be used to set SNMP variables or poll
variables on demand (negotiable). No other protocols than UDP are
supported.
This description of SNMP is confined to those aspects of SNMP that
are relevant for the design decisions in this document..
SNMP is a protocol for network management. SNMP is primarily a request / reply protocol (GET and SET). Agents is a term used for devices that listen for these GET / SET requests. Managers are applications that send these requests to agents. Next to GET / SET, SNMP also supports a way for agents to send unsolicited notifications to the manager (TRAP / INFORM).
SNMP can support multiple networking protocols, but the most popular
one is UDP. SNMP over UDP is not a reliable protocol: packets may be
dropped. Each GET, SET, TRAP or INFORM request or message are
transmitted in one UDP packet. Each packet contain only one data item.
UDP packets are of limited size (< 500 bytes for some network
infrastructures). Data items are limited to a handful of primitive
values, essentially just ints and strings. Data is encoded using BER.
The data items that can be queries from or set on agents are
organized in a tree. This tree is comparable to a database schema. The
schema language used to describe this tree is a subset of ASN.1 called
the SMI. The nodes in the tree are addressable using object ids (OIDs).
OIDs are globally unique. OIDs are a sequence of period delimited
numbers. The tree and its OIDs that an agent support is called a MIB.
Although there have been efforts to standardize MIBs, there are more
than 2300 cataloged MIBs with over 1,000,000 OIDs.
Although the data is organized in a tree, MIBs can describe tables
of data. This is done in a complicated way. Each cell in the table ends
up with its own OID. Cells can be read only one at at time through GET requests.
MIBs typically refer to many data items, e.g. querying the MIB for
the Java VM results in more than 400 data items. Much of this data is
static configuration information; there are only a handful data items
that contain information important for management, e.g. memory size,
and thread count.
Unit of monitoring -- The
unit of monitoring can be thought to be a complete MIB or individual
variables (OIDs) in a MIB. Considering the fact that the number of data
items in a MIB that are relevant for monitoring is a small portion (in
the case of the Java VM MIB this is about 1%), it makes more sense to
use individual OIDs as the unit of monitoring than it is to use a
complete MIB as the unit.
The structure of data being
processed -- Monitored variables can be considered as
stand-alone entities, or can be thought of as nodes in the MIB tree.
For the latter it should be considered that MIBs can be converted into
XSDs (either a single small XSD describing all possible MIBs or one XSD
per MIB), and hence the data of a device can be converted into XML.
However, the consumer of the data, i.e. the consuming SE (e.g. the
IEP), will likely prefer scalar or tabular data over complex structures
because of the large number of MIBs involved. This large number of MIBs
is also the reason that the design of the BC will be simplified if the
BC does not need to have detailed knowledge of all MIBs involved.
Therefore the structure of the data (the entities) should be scalar or
tabular data.
Performance: reduction of the
number of data
items produced -- Components consuming the data produced by the
BC are likely more interested in events rather than raw data values.
E.g. an uptime variable may
be used to detect if and when a system was restarted rather than
monitoring a monotonously increasing time value. Similarly, a boolean system down variable is only of
interest if its value changes. Since the number of monitored data items
will be large, it will be useful if there is a facility for primary
filtering of data to reduce the number of useless data items that the
consuming SE receives. It is not the intention to introduce business
rules here: this remains the responsibility of the consuming SE.
Performance: reduction of poll rate
--
Many data items in a MIB do not change after a device is booted. An
example is a dump of the system properties of the Java VM MIB. Yet, an
SE may store these variables in a relational database once. If there is
a mechanism to relate one event to another, the number of useless data
item polls can be reduced. For example, the system properties of the
Java VM should only be polled if a system restart is detected. Again,
the intention is not to introduce business rules : that remains the
responsibility of the consuming SE.
Configuration -- usually BCs
are configured through WSDL extensibility elements. However because of
the large number of OIDs to be configured, and because of the
requirement of dynamic configuration, this is not a feasible approach.
Hence, the configuration should reside in a data store that can be
manipulated independent from the deployment, e.g. a relational
database. Note that configuration is likely done by personnel other
than "comp app designers" or "deployers" and may need to be done
through remote consoles (e.g. a web browser). Also note that because of
the large number of monitored devices, configuration likely occurs
often (e.g. once or multiple times per day).
Scalability, reliability,
availability --
for reasons of scalability, it should be possible to run the SNMP on
several machines at the same time (horizontal scalability). These
multiple instances should use
the same configuration store, yet they should not poll the same data
elements. For automatic fail over, different instances should be aware
of other instances going down and should be able to take over from
these failed instances automatically.
The SNMP BC should be able to leverage multi core and hyper threaded
CPUs (vertical scaling) through a multi threaded design.
Extensibility, maintainability, testability -- The SNMP BC is
organized in a number of separate libraries that can be fully tested
outside of JBI. Details on the division in separate libraries is yet to
follow.
Manageability -- Through JMX
information can be obtained about the internal state of the SNMP BC,
e.g. queue sizes, thread pool sizes, etc. Performance counters etc. can
be reset through JMX. Performance related configuration parameters are
set in the WSDL extensibility elements and will not be changeable
through JMX.
Security -- The safe storage
of credentials in the configuration store is a special concern. File
system and/or database security will serve as the primary mechanism to
safeguard this information. A secondary mechanism through obfuscation
(encryption with a hard-coded key) can also be used in addition.
Portability --The SNMP BC depends on JDK 5. It may rely on classes in Glassfish and the Sun JDK (negotiable). There will be no dependence on operating system or hardware platform.
Event mechanism -- An event
mechanism, i.e. event listeners and event generators, will be used as
the conceptual model for the monitoring of devices. Variable monitors
act like event listeners: they are triggered by events. When a variable
monitor is triggered by an event, it polls the monitored variable.
Event sources include timers and triggers. Variable monitors can also
act as event generators: for example a change in a variable boot time may cause an event that
will invoke other variable monitors.
Variable monitors -- each
monitored variable will have a variable monitor which is responsible
for invoking the GET
operation. Variable monitors are invoked by events, have primary
filters which may cause other events to be thrown, and may produce
output that is sent to the SE. Note that variables may be scalar or
tabular.
Trap monitors -- a trap
monitor is invoked by an SNMP trap
or inform. Similarly to
variable monitors, it invokes primary filters which in turn may cause
events, and may produce output that is sent to the SE.
Timer events -- Schedules
express a series of points in time at which a timer event should be
caused. E.g. every 5 seconds between 09:00 and 18:00 from Monday till
Friday.
Primary filters -- variable
monitors can have one or more primary filters. If there are multiple
filters, these filters are not put in series but are put in parallel
and run independently from eachother.
Here are a few examples of primary filters:
Buffering -- traps and data
monitoring may produce data at a rate higher than the consuming SE can
process. For this the BC has a limited buffer. If this buffer
overflows, data will be dropped from the buffer. Data with a lower
priority will be dropped from the buffer before data with a higher
priority. Dropping data from the buffer is highly undesirable, and
special measures will be taken so that this situation may only occur if
there is a flood of traps. Should data be dropped, this will be logged
and alerts will be raised.
To reduce the likelihood of dropped data, most operations are
buffered and some type of operations take precedence over other types
of operations:
Configuration store -- the
configuration data store will have at least the following entities:
XSD -- Variable monitors and
trap monitors may generate data that is sent to an SE. The schema of
this data is TBD.
Message exchange pattern --
Considering the lack of reliability, in-only
is preferred (TBD).
To populate the configuration store a separate tool may be required.
This tool may need to be able to load MIBs and allow the user to select
OIDs from the MIBs. This tool will be described in more detail later.