Types of monitoring

There are many different types of monitoring tool and different situations in which each will be used. This section focuses on some of the different types of monitoring that can be performed and when they would be appropriate.

Active versus Passive Monitoring

Active Monitoring refers to the ongoing ‘interrogation’ of a device or system to determine its status. This type of monitoring can be resource intensive and is usually reserved to proactively monitor the availability of critical devices or systems; or as a diagnostic step when attempting to resolve an Incident or diagnose a problem.
Passive Monitoring is more common and refers to generating and transmitting events to a ‘listening device’ or monitoring agent. Passive Monitoring depends on successful definition of events and instrumentation of the system being monitored (see section 4.1).

Reactive versus Proactive

Reactive Monitoring is designed to request or trigger action following a certain type of event or failure. For example, server performance degradation may trigger a reboot, or a system failure will generate an incident. Reactive monitoring is not only used for exceptions. It can also be used as part of normal operations procedures, for example a batch job completes successfully, which prompts the scheduling system to submit the next batch job.
Proactive Monitoring is used to detect patterns of events which indicate that a system or service may be about to fail. Proactive monitoring is generally used in more mature environments where these patterns have been detected previously, often several times. Proactive Monitoring tools are therefore a means of automating the experience of seasoned IT staff and are often created through the Proactive Problem Management process (see Continual Service Improvement publication).

Please note that Reactive and Proactive Monitoring could be active or passive, as per Table 5.1:

	Active	Passive
Reactive	Used to diagnose which device is causing the failure and under what conditions (e.g. ‘ping’ a device, or run and track a sample transaction through a series of devices) Requires knowledge of the infrastructure topography and the mapping of services to CIs	Detects and correlates event records to determine the meaning of the events and the appropriate action (e.g. a user logs in three times with the incorrect password, which generates represents a security exception and is escalated through Information Security Management procedures) Requires detailed knowledge of the normal operation of the infrastructure and services
Proactive	Used to determine the real-time status of a device, system or service – usually for critical components or following the recovery of a failed device to ensure that it is fully recovered (i.e. is not going to cause further incidents)	Event records are correlated over time to build trends for Proactive Problem Management. Patterns of events are defined and programmed into correlation tools for future recognition

Table 5.1 Active and Passive Reactive and Proactive Monitoring

Continuous Measurement versus Exception-Based Measurement

Continuous Measurement is focused on monitoring a system in real time to ensure that it complies with a performance norm (for example, an application server is available for 99.9% of the agreed service hours). The difference between Continuous Measurement and Active Monitoring is that Active Monitoring does not have to be continuous. However, as with Active Monitoring, this is resource intensive and is usually reserved for critical components or services. In most cases the cost of the additional bandwidth and processor power outweighs the benefit of continuous measurement. In these cases monitoring will usually be based on sampling and statistical analysis (e.g. the system performance is reported every 30 seconds and extrapolated to represent overall performance). In these cases, the method of measurement will have to be documented and agreed in the OLAs to ensure that it is adequate to support the Service Reporting Requirements (see Continual Service Improvement publication).
Exception-Based Measurement does not measure the real-time performance of a service or system, but detects and reports against exceptions. For example, an event is generated if a transaction does not complete, or if a performance threshold is reached. This is more cost-effective and easier to measure, but could result in longer service outages. Exception-Based Measurement is used for less critical systems or on systems where cost is a major issue. It is also used where IT tools are not able to determine the status or quality of a service (e.g. if printing quality is part of the service specification, the only way to measure this is physical inspection – often performed by the user rather than IT staff). Where Exception-Based Measurement is used, it is important that both the OLA and the SLA for that service reflect this, as service outages are more likely to occur, and users are often required to report the exception.

Performance versus output

There is an important distinction between the reporting used to track the performance of components or teams or department used to deliver a service and the reporting used to demonstrate the achievement of service quality objectives.

IT managers often confuse these by reporting to the business on the performance of their teams or departments (e.g. number of calls taken per Service Desk Analyst), as if that were the same thing as quality of service (e.g. incidents solved within the agreed time).

Performance Monitoring and metrics should be used internally by the Service Management to determine whether people, process and technology are functioning correctly and to standard.

Users and customers would rather see reporting related to the quality and performance of the service.

Although Service Operation is concerned with both types of reporting, the primary concern of this publication is Performance Monitoring, whereas monitoring of Service Quality (or Output-Based Monitoring) will be discussed in detail in the Continual Service Improvement publication.

Date: 2014-12-29; view: 1009

<== previous page	\|	next page ==>
Defining objectives for Monitoring and Control	\|	Operational Monitoring and Continual Service Improvement

doclecture.net - lectures - 2014-2024 year. Copyright infringement or personal data (0.006 sec.)