Reliability, Availability, and Serviceability
To ensure the maximum reliability of any highly available solutions, ITMS Ltd’s architects will make sure, subject to budget constraints, that all areas of the solution, whether it server cluster or Local Area Network backbone, have the right mix of hardware and software.
This hardware and software tends to address the need of larger organisations to have highly available computer environments, due to operating constraints of one kind or another. Such technologies utilised in this area may be clustering, storage area networks, backup silos, data replication and equipment redundancy.
Sun Microsystems described this very well as issues of Reliability, Availability and Serviceability (RAS). Although the following information was written about Sun Microsystems Hardware and Operating System environment, the approaches can be applied to other mainstream systems such as Microsoft Windows Server and Linux based servers.
Excerpts from Suns Reliability, Availability, and Serviceability (RAS) Data Sheet
Computer downtime can cost your company thousands, or even millions, of dollars in lost revenues and productivity. To dramatically reduce downtime, the Solaris Operating Environment provides data centre-class reliability, availability, and serviceability (RAS) at a fraction of a mainframe’s cost.
Reliability, Availability, and Serviceability (RAS)
Features of the Solaris Operating Environment, combined with products in Solaris Enterprise Server, can help you accomplish three important RAS goals:
Minimize unplanned downtime
Minimize planned downtime
Rapid recovery after a failure
Minimizing Unplanned Downtime
Solaris Operating Environment, the latest release of the Solaris Operating Environment, includes a number of features designed to help keep your network up and running:
Dynamic reconfiguration allows a system to continue running when a system board fails. You can even replace the faulty board while the system is in operation.
Improved Device Configuration Library
The libdevinfo library, used to obtain device configuration information, is more robust and reliable in the Solaris Operating Environment. This improvement reduces unplanned downtime by allowing applications to retrieve more stable and consistent device configuration data.
Solaris Enterprise Server
The following components of Solaris Enterprise Server software (some also available separately) offer additional RAS features:
Sun Cluster software connects up to four servers as if they were one system. Clustering improves reliability by permitting applications and services to transparently move from one system in the cluster to another in the event a failures occur.
With Solaris Bandwidth Manager software, you can prioritize network traffic, preventing a small number of applications or users from consuming all available bandwidth. It enables you to ensure high-quality service to everyone in your enterprise.
Solaris Resource Manager improves reliability by balancing system performance and ensuring that mission-critical applications have access to needed resources
Minimizing Planned Downtime
Improved Hot-Plug Capability
The hot-plug capability of the Solaris Operating Environment enables you to add or remove cards and subsystems while still online. Improved interfaces in the Solaris Operating Environment software allow better control and coordination of device reconfiguration during this process.
Solaris Online Upgrade
Solaris Online Upgrade allows you to upgrade from Solaris 2.6 software or later to new versions of the operating environment — without taking the system off line. This feature will be available in the Solaris 8 Operating Environment.
Improved Core Dump Analysis
The Solaris Operating Environment includes the ability to make dumps configurable and compress dump data when writing to a dump device. Other enhancements make the core dump process more robust so you can determine sources of failure more quickly.
By tracing the route an IP packet follows to an Internet host, the traceroute utility allows you to quickly diagnose and correct routing misconfigurations and routing path failures.
Kernel Debugging Enhancements
The Solaris Operating Environment includes enhancements to kadb (for live system debugging) and adb (for crash dump analysis) to improve the troubleshooting process.
Improved Logging of Kernel Events and Errors
More effective logging of kernel events and errors improves serviceability by providing both valuable warnings and additional system information to administrators.
Available in the first half of 1999, remote console capability will allow a remote service technician to diagnose problems and perform maintenance in the same manner as an on-site administrator.
UNIX® file system logging minimizes unplanned downtime by recording UFS updates in a log before the updates are applied to the file system. UFS logging eliminates inconsistencies and makes rebooting much faster.
RAS features are included in the Solaris Operating Environment. Online Upgrade and Remote Console will be available as specified; all other features are now shipping as part of the Solaris Operating Environment.
Please contact us for more information: firstname.lastname@example.org