|
Comments
SYS-CON.TV
SYS-CON.TV Webcasts
Blog-N-Play.com
|
Top Three Links You Must Click On
Telecom AdvancedTCA and Carrier Grade Linux
Combine high availability with low total cost of ownership
By: Kirti Devi
Oct. 29, 2004 12:00 AM
Telecommunications service providers face a tough challenge in today's marketplace. To stay competitive and profitable, they must rapidly introduce new voice and data services while, at the same time, continue to reduce capital expenses and operating costs. This challenge is made even more difficult by the need to deliver next-generation services on reliable carrier-grade equipment that meets rigorous availability, serviceability, and scalability requirements. To address these pressures, many leaders in the telecommunications industry are moving away from the high costs and slow development times associated with proprie-tary platform architectures. Instead, they are developing and deploying solutions based on the AdvancedTCA (ATCA) specification. This establishes an open, standards-based modular architecture that reduces the total cost of ownership (TCO) while providing an effective framework for delivering five nines (99.999%) availability. The following trends are driving the shift to ATCA:
ATCA: Carrier-Grade Solutions on Modular PlatformsThe ATCA specification was developed by the PCI Industrial Computers Manufacturers Group (PICMG) with active support from over 100 leading telecommunications companies. It defines an open, standards-based board and chassis form factor, interconnect fabric, and management schema that are tailored to meet the HA needs of next-generation converged applications. By supporting multiple interconnect standards, including 10 Gigabit Ethernet, StarFabric, InfiniBand, and PCI Express, ATCA delivers a high level of platform and component flexibility. This enables rapid design using best-of-breed building blocks. By defining a unified and consistent management framework, it also helps to support HA requirements and reduce operating costs across the network.Built-in Support for HADowntimes of less than five minutes per year are the norm for mission-critical telecom applications. The ATCA specification is designed to meet these requirements via the following:
Standards-Based System ManagementThe ATCA specification devotes a great deal of attention to system management. The management protocol is based on the Intelligent Platform Management Interface (IPMI) v1.5, which is extended to address carrier-grade HA requirements. A dedicated and fully redundant Intelligent Platform Management Bus (IPMB) carries messages across the backplane via a two-wire serial interconnect. This helps to ensure a standards-based management methodology that improves availability while reducing costs.To enable centralized, scalable, out-of-band management, ATCA establishes a shelf manager that acts as the central management authority in the system. This shelf manager monitors and controls the operation of boards and other components. It does so by maintaining a health counter that reports anomalies and takes corrective action when required. Key functions include managing power and cooling resources to maintain safe operating conditions, managing hot-swap component replacement, monitoring backplane interconnection types to prevent incompatible interconnections, and retrieving inventory information from shelf components to simplify replacement and reduce human error. FRUEvery FRU has its own Intelligent Platform Management (IPM) controller that monitors its operations and health and reports to the shelf manager. There is also a higher level, external manager that is in charge of managing multiple shelves within the Telco office.The Chassis Management ModuleOne example of an ATCA-compliant shelf manager is the Intel NetStructure Chassis Management Module (CMM). It performs all the functions described previously. It also has event logging and alarm capabilities for multiple node and/or fabric slots, as well as for system power entry modules and fan trays.Based on changing health conditions, different mechanisms can be employed to monitor components and send alarms. Both the hardware and software of the CMM are designed for adaptability, so solution designers can extend the management software stack to address application-specific requirements. To enable seamless failover of the management function in case of emergency, two CMMs can be configured in an active-standby mode to provide redundancy. Essential information is constantly synchronized between the two devices. This includes IPMB state information, system event logs, health information, CMM configuration information, user names and passwords, power and hot-swap information, cooling parameters, and electronic-keying information. This is to prevent incompatible boards from connecting with the backplane. Communications between the two CMMs typically take place over dedicated, redundant management buses, or they can be configured to communicate across redundant Ethernet connections. HA MiddlewareThe keys to HA are configuration and cluster management (and redundancy). HA middleware is the controlling entity for monitoring and managing redundant, clustered resources and configurations. The HA middleware is an integral part of the software stack running on an ATCA-compliant blade (see Figure 1). It maintains a system model of the components that comprise the cluster, defines how faults are detected, and instigates appropriate actions (e.g., reconfiguring the cluster or notifying and restarting affected application components). Because the members of each cluster may be different, and may be running different OSs and HA middleware, it's essential that the solution be based on industry standards that enable interoperability in heterogeneous environments.To deliver appropriate levels of availability, the HA middleware must do the following:
In a typical fault management cycle, the HA middleware:
CGL: A Hardened, Standards-Based OSTo enable truly carrier-grade solutions, the OS must be as robust as the hardware and middleware, and also be integrated within the complete HA solution. To address this requirement, the OSDL CGL Working Group has defined additions to Linux that specifically address telecommunications requirements. These extensions have been designed to support ATCA-based solutions and to support standards-based HA middleware interfaces.Redundancy SupportRedundant boards and rapid failover are critical for ensuring HA in an ATCA-compliant platform. The OS provides two key functions that are important in supporting this capability: Ethernet connectivity and redundant data storage.In ATCA-compliant boards, Ethernet connectivity is typically provided by redundant Network Interface Controllers (NICs) that are configured to appear as a single virtual interface using a technique called "Ethernet bonding." If one NIC fails, the bonded NIC(s) takes over transparently. Support for Ethernet bonding is a key CGL requirement, and solutions are currently available. This includes the bonding driver developed by the SourceForge "Channel Bonding" project and the Intel Advanced Networking Services Driver (iANS). With these drivers, bonded NICs can be configured in a variety of modes, ranging from simple failover to a standby NIC, to a high-bandwidth "link aggregation" mode, in which all bonded NICs are simultaneously active. To achieve 99.999% or better availability, it's critical that disk data corruption and disk failures do not lead to system downtime. One way to meet this requirement is by employing a Redundant Array of Inexpensive Disks (RAID) system. The OSDL CGL requires support for software RAID 1 and for disk mirroring, which essentially maintains duplicate sets of all data on separate disk drives. RAID 1 also allows booting from a mirrored drive if the primary disk drive fails. In addition, ATCA-compliant systems require OS support for hot swapping of defective disk drives (or entire storage boards) and for resynching the replacement drive or board after insertion, without having to take down the system. The Linux Multiple Devices driver, along with other user space RAID tools, provides these essential features. Hot Swap SupportTo enable hot swapping of individual devices and entire boards within an ATCA-compliant chassis, the OS must support persistent device naming. To fulfill this need, a CGL-conforming OS provides a policy-based subsystem for the naming of transient components. This allows long-running HA applications to gracefully deal with changing resources, without restarting applications or rebooting the OS.HA Middleware SupportHA middleware relies on a set of services provided by the OS, including full support for IPMI 1.5 and for the Hardware Platform Interface defined by the Service Availability Forum (SAF). Without IPMI support, solution providers would be forced to depend on costly custom software stacks to create complete HA solutions. A CGL-conforming OS provides an IPMI driver that provides a direct connection to each blade's IPMI controller and can be utilized by software developers. It also provides a framework for enabling system management software, and for reporting critical OS errors to the shelf manager via the IPMB. This enables the shelf manager to take appropriate action quickly to maintain services.The SAF is an industry consortium of major players in the telecommunication industry. The goal of the SAF is to develop a framework and specifications for service availability, so highly available carrier-grade systems can be developed using off-the-shelf platforms, middleware, and applications. The SAF has published the Hardware Platform Interface, which is the programming interface between the HA middleware and the platform. Members of the forum are working together to create multiple implementations of the specification. A CGL-conforming OS provides a compliant hardware platform interface, so it can interface seamlessly with standardized HA middleware and applications. This enables additional layers of protection to be built on top of IPMI, utilizing multiple communication paths (IPMB, Ethernet, etc.) as needed to meet the extreme demands of carrier environments. The TCO Advantages of CGLAs an open, standards-based OS, CGL complements the cost advantages of ATCA, enabling a consistent, carrier-grade operating environment that supports a wide range of telecommunications architectures and applications. Multiple vendors offer CGL distributions based on the freely available Linux kernel, so pricing is competitive and developers have considerable flexibility in choosing the distribution that best meets their specific feature and cost requirements.The extended feature set of CGL-compliant OSs also reduces the need for users to develop and maintain custom solutions. A large number of the CGL requirements have already been implemented as open source projects. Other requirements are a more natural fit for proprietary implementations based on open interfaces, and this is fueling a growing ecosystem of independent software vendors. This combination of software development models is resulting in rapid development and extensive, off-the-shelf options that can support very diverse requirements, while reducing overall costs. CGL requirements also focus on serviceability, which reduces TCO by ensuring that CGL systems can be maintained, upgraded, and debugged quickly. Support for the Simple Network Management Protocol is one example that can reduce costs by simplifying integration with existing management frameworks. In addition, CGL requirements help to ensure that sufficient information is available in multiple locations (via kernel, system, and application dumps) for debugging system failures. CGL also includes a variety of features that help to accelerate problem resolution, improve support for security solutions, and enable administrators to debug a live system. In combination with the wide availability of administrators who are familiar with the Linux environment, these tools and capabilities can dramatically reduce both system downtime and total costs. SummaryTo sustain profitability in today's market, telecommunications service providers need solutions that combine HA (99.999%) with a low TCO. In conjunction with CGL and HA Middleware, ATCA provides an open, standards-based, carrier-grade solution that meets these requirements.Building blocks are available today, including multiple CGL distributions that can be combined with off-the-shelf or custom HA middleware to build comprehensive telecommunications solutions that are more affordable, scalable, and adaptable than traditional, proprietary architectures. These systems can be developed more quickly, and they typically deliver better performance in a smaller footprint. They can also be designed with full redundancy, quick failover mechanisms, and advanced serviceability features that meet the HA requirements of critical telecommunications applications. Reader Feedback: Page 1 of 1
Subscribe to our RSS feeds now and receive the next article instantly!Latest Articles, News & Posts
Subscribe to the World's Most Powerful Newsletters
|
Today's Top Reads
Today's Top Links You Must Click On !
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||