Reliability, Availability, and Serviceability

To ensure the maximum reliability of any highly available solutions, ITMS Ltd’s architects will make sure, subject to budget constraints, that all areas of the solution, whether it server cluster or Local Area Network backbone, have the right mix of hardware and software.

This hardware and software tends to address the need of larger organisations to have highly available computer environments, due to operating constraints of one kind or another. Such technologies utilised in this area may be clustering, storage area networks, backup silos, data replication and equipment redundancy.

Sun Microsystems described this very well as issues of Reliability, Availability and Serviceability (RAS). Although the following information was written about Sun Microsystems Hardware and Operating System environment, the approaches can be applied to other mainstream systems such as Microsoft Windows Server and Linux based servers.

Excerpts from Suns Reliability, Availability, and Serviceability (RAS) Data Sheet

Computer downtime can cost your company thousands, or even millions, of dollars in lost revenues and productivity. To dramatically reduce downtime, the Solaris Operating Environment provides data centre-class reliability, availability, and serviceability (RAS) at a fraction of a mainframe’s cost.

Reliability, Availability, and Serviceability (RAS)

Features of the Solaris Operating Environment, combined with products in Solaris Enterprise Server, can help you accomplish three important RAS goals:

  • Minimize unplanned downtime
  • Minimize planned downtime
  • Rapid recovery after a failure

Minimizing Unplanned Downtime

Solaris Operating Environment, the latest release of the Solaris Operating Environment, includes a number of features designed to help keep your network up and running:

Dynamic Reconfiguration

Dynamic reconfiguration allows a system to continue running when a system board fails. You can even replace the faulty board while the system is in operation.

Improved Device Configuration Library

The libdevinfo library, used to obtain device configuration information, is more robust and reliable in the Solaris Operating Environment. This improvement reduces unplanned downtime by allowing applications to retrieve more stable and consistent device configuration data.

Solaris Enterprise Server

The following components of Solaris Enterprise Server software (some also available separately) offer additional RAS features:

Clustering

Sun Cluster software connects up to four servers as if they were one system. Clustering improves reliability by permitting applications and services to transparently move from one system in the cluster to another in the event a failures occur.

Bandwidth Allocation

With Solaris Bandwidth Manager software, you can prioritize network traffic, preventing a small number of applications or users from consuming all available bandwidth. It enables you to ensure high-quality service to everyone in your enterprise.

Resource Management

Solaris Resource Manager improves reliability by balancing system performance and ensuring that mission-critical applications have access to needed resources

Minimizing Planned Downtime

Improved Hot-Plug Capability

The hot-plug capability of the Solaris Operating Environment enables you to add or remove cards and subsystems while still online. Improved interfaces in the Solaris Operating Environment software allow better control and coordination of device reconfiguration during this process.

Solaris Online Upgrade

Solaris Online Upgrade allows you to upgrade from Solaris 2.6 software or later to new versions of the operating environment — without taking the system off line. This feature will be available in the Solaris 8 Operating Environment.

Rapid Recovery

Improved Core Dump Analysis

The Solaris Operating Environment includes the ability to make dumps configurable and compress dump data when writing to a dump device. Other enhancements make the core dump process more robust so you can determine sources of failure more quickly.

Traceroute Utility

By tracing the route an IP packet follows to an Internet host, the traceroute utility allows you to quickly diagnose and correct routing misconfigurations and routing path failures.

Kernel Debugging Enhancements

The Solaris Operating Environment includes enhancements to kadb (for live system debugging) and adb (for crash dump analysis) to improve the troubleshooting process.

Improved Logging of Kernel Events and Errors

More effective logging of kernel events and errors improves serviceability by providing both valuable warnings and additional system information to administrators.

Remote Console

Available in the first half of 1999, remote console capability will allow a remote service technician to diagnose problems and perform maintenance in the same manner as an on-site administrator.

UFS Logging

UNIX® file system logging minimizes unplanned downtime by recording UFS updates in a log before the updates are applied to the file system. UFS logging eliminates inconsistencies and makes rebooting much faster.

Availability

Solaris Features

RAS features are included in the Solaris Operating Environment. Online Upgrade and Remote Console will be available as specified; all other features are now shipping as part of the Solaris Operating Environment.

Please contact us for more information: info@itmsltd.net

WordPress Appliance - Powered by TurnKey Linux