Availability Management Explained

availability management

Availability Management is an IT Service Management (ITSM) process area.

What is Availability?

Availability is often based on the agreed service time and downtime. Availability Management is completed at two interconnected levels:

Service Availability

    • Involves all aspects of service availability and unavailability and the impact of component availability.

Component Availability

    • Involves all aspects of component availability and unavailability.

Availability is usually calculated as %, and includes the following elements.

Reliability

A measure of how long a Configuration Item or IT Service can perform its agreed Function without interruption. Usually measured as MTBF or MTBSI

MTBF – Mean Time Between Failures

MTBSI – Mean Time Between Service Incidents

Maintainability

A measure of how quickly and Effectively a Configuration Item or IT Service can be restored to normal working after a Failure. Maintainability is often measured and reported as MTRS.

MTRS – Mean Time to Restore Service

Serviceability

The ability of a Third-Party Supplier to meet the terms of its Contract. This Contract will include agreed levels of Reliability, Maintainability or Availability for a Configuration Item.

Objectives

The goal of the Availability Management (AM) process is to ensure that the level of service availability delivered in all services is matched to or exceeds the current and future agreed needs of the business, in a cost-effective manner.

  • Produces and maintains an appropriate and up-to-date Availability Plan, which reflects the current and future needs of the business.
  • Ensures that service availability achievements meet or exceed all of their agreed targets by managing the services and resources that are related to availability performance.

Basic Concepts

Availability Management should perform both reactive and proactive activities.

Reactive Activities

  • Monitor, measure, analyze, report and review service and component availability
  • Investigate all service and component unavailability and investigate remedial action

Proactive Activities

  • Risk assessment and management
  • Review all new and changed services and test all availability and resilience mechanisms