Home Random Page


CATEGORIES:

BiologyChemistryConstructionCultureEcologyEconomyElectronicsFinanceGeographyHistoryInformaticsLawMathematicsMechanicsMedicineOtherPedagogyPhilosophyPhysicsPolicyPsychologySociologySportTourism






Response selection

At this point in the process, there are a number of response options available. It is important to note that the response options can be chosen in any combination. For example, it may be necessary to preserve the log entry for future reference, but at the same time escalate the event to an Operations Management staff member for action.

The options in the flowchart are examples. Different organizations will have different options, and they are sure to be more detailed. For example, there will be a range of auto responses for each different technology. The process of determining which one is appropriate and how to execute it are not represented in this flowchart. Some of the options available are:

  • Event logged: Regardless of what activity is performed, it is a good idea to have a record of the event and any subsequent actions. The event can be logged as an Event Record in the Event Management tool, or it can simply be left as an entry in the system log of the device or application that generated the event. If this is the case, though, there needs to be a standing order for the appropriate Operations Management staff to check the logs on a regular basis and clear instructions about how to use each log. It should also be remembered that the event information in the logs may not be meaningful until an incident occurs; and where the Technical Management staff use the logs to investigate where the incident originated. This means that the Event Management procedures for each system or team need to define standards about how long events are kept in the logs before being archived and deleted.
  • Auto response: Some events are understood well enough that the appropriate response has already been defined and automated. This is normally as a result of good design or of previous experience (usually Problem Management). The trigger will initiate the action and then evaluate whether it was completed successfully. If not, an Incident or Problem Record will be created. Examples of auto responses include:
    • Rebooting a device
    • Restarting a service
    • Submitting a job into batch
    • Changing a parameter on a device
    • Locking a device or application to protect it against unauthorized access.

Note: locking a device may result in denial of service to authorized users, which could be exploited by a deliberate attacker – so great care should be taken when deciding whether this is an appropriate automated action. Where this response is used it may be prudent to also combine this with a call for human intervention, so that the automated action can be swiftly checked and approved.

  • Alertand human intervention: If the event requires human intervention, it will need to be escalated. The purpose of the alert is to ensure that the person with the skills appropriate to deal with the event is notified. The alert will contain all the information necessary for that person to determine the appropriate action – including reference to any documentation required (e.g. user manuals). It is important to note that this is not necessarily the same as the functional escalation of an incident, where the emphasis is on restoring service within an agreed time (which may require a variety of activities). The alert requires a person, or team, to perform a specific action, possibly on a specific device and possibly at a specific time, e.g. changing a toner cartridge in a printer when the level is low.
  • Incident,problemorchange? Some events will represent a situation where the appropriate response will need to be handled through the Incident, Problem or Change Management process. These are discussed below, but it is important to note that a single incident may initiate any one or a combination of these three processes – for example, a non-critical server failure is logged as an incident, but as there is no workaround, a Problem Record is created to determine the root cause and resolution and an RFC is logged to relocate the workload onto an alternative server while the problem is resolved.
  • Open an RFC: There are two places in the Event Management process where an RFC can be created:
    • When an exception occurs: For example, a scan of a network segment reveals that two new devices have been added without the necessary authorization. A way of dealing with this situation is to open an RFC, which can be used as a vehicle for the Change Management process to deal with the exception (as an alternative to the more conventional approach of opening an incident that would be routed via the Service Desk to Change Management). Investigation by Change Management is appropriate here since unauthorized changes imply that the Change Management process was not effective.
    • Correlation identifies that achangeis needed: In this case the event correlation activity determines that the appropriate response to an event is for something to be changed. For example, a performance threshold has been reached and a parameter on a major server needs to be tuned. How does the correlation activity determine this? It was programmed to do so either in the Service Design process or because this has happened before and Problem Management or Operations Management updated the Correlation Engine to take this action.
  • Open anIncident Record: As with an RFC, an incident can be generated immediately when an exception is detected, or when the Correlation Engine determines that a specific type or combination of events represents an incident. When an Incident Record is opened, as much information as possible should be included – with links to the events concerned and if possible a completed diagnostic script.
  • Open or link to aProblem Record: It is rare for a Problem Record to be opened without related incidents (for example as a result of a Service Failure Analysis (see Service Design publication) or maturity assessment, or because of a high number of retry network errors, even though a failure has not yet occurred). In most cases this step refers to linking an incident to an existing Problem Record. This will assist the Problem Management teams to reassess the severity and impact of the problem, and may result in a changed priority to an outstanding problem.

However, it is possible, with some of the more sophisticated tools, to evaluate the impact of the incidents and also to raise a Problem Record automatically, where this is warranted, to allow root-cause analysis to commence immediately.



  • Special types of incident: In some cases an event will indicate an exception that does not directly impact any IT service, for example, a redundant air conditioning unit fails, or unauthorized entry to a data centre. Guidelines for these events are as follows:
    • An incident should be logged using an Incident Model that is appropriate for that type of exception, e.g. an Operations Incident or Security Incident (see paragraph 4.2.4.2 for more details of Incident Models).
    • The incident should be escalated to the group that manages that type of incident.
    • As there is no outage, the Incident Model used should reflect that this was an operational issue rather than a service issue. The statistics would not normally be reported to customers or users, unless they can be used to demonstrate that the money invested in redundancy was a good investment.
    • These incidents should not be used to calculate downtime, and can in fact be used to demonstrate how proactive IT has been in making services available.

Date: 2014-12-29; view: 987


<== previous page | next page ==>
Significance of events | Close event
doclecture.net - lectures - 2014-2024 year. Copyright infringement or personal data (0.006 sec.)