Comments
pemurray@interrasys.com wrote: Bruce, Excellent and insightful article!! Sybase has a fabulous product in PowerBuilder. It is a shame that developers are under pressure to move away from it. Your note that there may not be enough PowerBuilder developers is a chicken and egg problem for Sybase. Sybase needs to be agressive in several areas to reverse the tide. 1) The pricing has to change. Would it not be better to have many thousands of people buying it for half of its current list price rather than rely on a h...


Read Digital Edition


SYS-CON.TV
Blog-N-Play.com
Anytime a feature of a framework gives me something for free that I don't need to manually implement I'm a happy camper. One such feature of ASP.NET MVC 2 is jQuery client-side validation. The
Top Three Links You Must Click On


AdvancedTCA and Carrier Grade Linux
Combine high availability with low total cost of ownership

Telecommunications service providers face a tough challenge in today's marketplace. To stay competitive and profitable, they must rapidly introduce new voice and data services while, at the same time, continue to reduce capital expenses and operating costs. This challenge is made even more difficult by the need to deliver next-generation services on reliable carrier-grade equipment that meets rigorous availability, serviceability, and scalability requirements.

To address these pressures, many leaders in the telecommunications industry are moving away from the high costs and slow development times associated with proprie-tary platform architectures. Instead, they are developing and deploying solutions based on the AdvancedTCA (ATCA) specification. This establishes an open, standards-based modular architecture that reduces the total cost of ownership (TCO) while providing an effective framework for delivering five nines (99.999%) availability.

The following trends are driving the shift to ATCA:

  • The industry is moving to standards-based platforms with standardized carrier-grade OSs and middleware interfaces. This reduces long-term platform development costs and provides TEMs with the broadest available choice of vendors.
  • To reduce development costs and time-to-market, TEMs are deploying common platforms for converged computing and communications applications.
  • The use of ATCA platforms enables TEMs to shift away from hardware development and refocus their efforts on supplying service-provider custo-mers with value-added software and services.
To complement the ATCA specification, the Carrier Grade Linux (CGL) Working Group of the Open Source Development Labs (OSDL) has defined a series of Linux requirements that extend the Linux OS for carrier-grade environments. In combination with appropriate management software and high availability (HA) middleware, ATCA and CGL enable complete, carrier-grade infrastructure solutions using commercial hardware and software building blocks. The result is a modular approach that delivers on the promise of HA with reduced TCO, while improving performance, scalability, and adaptability.

ATCA: Carrier-Grade Solutions on Modular Platforms

The ATCA specification was developed by the PCI Industrial Computers Manufacturers Group (PICMG) with active support from over 100 leading telecommunications companies. It defines an open, standards-based board and chassis form factor, interconnect fabric, and management schema that are tailored to meet the HA needs of next-generation converged applications. By supporting multiple interconnect standards, including 10 Gigabit Ethernet, StarFabric, InfiniBand, and PCI Express, ATCA delivers a high level of platform and component flexibility. This enables rapid design using best-of-breed building blocks. By defining a unified and consistent management framework, it also helps to support HA requirements and reduce operating costs across the network.

Built-in Support for HA

Downtimes of less than five minutes per year are the norm for mission-critical telecom applications. The ATCA specification is designed to meet these requirements via the following:
  • No single point of failure: Support for full redundancy is built into the specification.
  • Hot swap capabilities: All field replaceable units (FRUs) can be replaced without interrupting critical platform operations.
  • Built-in manageability support: Systems can be constantly monitored and managed to anticipate and prevent failures. All management mechanisms are designed to be "out-of-band," so they don't interfere with essential functionality.

Standards-Based System Management

The ATCA specification devotes a great deal of attention to system management. The management protocol is based on the Intelligent Platform Management Interface (IPMI) v1.5, which is extended to address carrier-grade HA requirements. A dedicated and fully redundant Intelligent Platform Management Bus (IPMB) carries messages across the backplane via a two-wire serial interconnect. This helps to ensure a standards-based management methodology that improves availability while reducing costs.

To enable centralized, scalable, out-of-band management, ATCA establishes a shelf manager that acts as the central management authority in the system. This shelf manager monitors and controls the operation of boards and other components. It does so by maintaining a health counter that reports anomalies and takes corrective action when required. Key functions include managing power and cooling resources to maintain safe operating conditions, managing hot-swap component replacement, monitoring backplane interconnection types to prevent incompatible interconnections, and retrieving inventory information from shelf components to simplify replacement and reduce human error.

FRU

Every FRU has its own Intelligent Platform Management (IPM) controller that monitors its operations and health and reports to the shelf manager. There is also a higher level, external manager that is in charge of managing multiple shelves within the Telco office.

The Chassis Management Module

One example of an ATCA-compliant shelf manager is the Intel NetStructure Chassis Management Module (CMM). It performs all the functions described previously. It also has event logging and alarm capabilities for multiple node and/or fabric slots, as well as for system power entry modules and fan trays.

Based on changing health conditions, different mechanisms can be employed to monitor components and send alarms. Both the hardware and software of the CMM are designed for adaptability, so solution designers can extend the management software stack to address application-specific requirements.

To enable seamless failover of the management function in case of emergency, two CMMs can be configured in an active-standby mode to provide redundancy. Essential information is constantly synchronized between the two devices. This includes IPMB state information, system event logs, health information, CMM configuration information, user names and passwords, power and hot-swap information, cooling parameters, and electronic-keying information. This is to prevent incompatible boards from connecting with the backplane. Communications between the two CMMs typically take place over dedicated, redundant management buses, or they can be configured to communicate across redundant Ethernet connections.

HA Middleware

The keys to HA are configuration and cluster management (and redundancy). HA middleware is the controlling entity for monitoring and managing redundant, clustered resources and configurations. The HA middleware is an integral part of the software stack running on an ATCA-compliant blade (see Figure 1). It maintains a system model of the components that comprise the cluster, defines how faults are detected, and instigates appropriate actions (e.g., reconfiguring the cluster or notifying and restarting affected application components). Because the members of each cluster may be different, and may be running different OSs and HA middleware, it's essential that the solution be based on industry standards that enable interoperability in heterogeneous environments.

To deliver appropriate levels of availability, the HA middleware must do the following:

  • Be efficient and fast. This is essential to enable quick failover and to minimize the impact on critical telecommunications functions.
  • Be self-managing and self-reliant. The failover cycle must be able to operate automatically and in real time, without human intervention. The HA middleware must therefore monitor and collect data about all critical system resources, including hardware, software, the OS, and applications, so it can respond quickly and appropriately.
  • Configure and maintain a system-wide state model. The model must represent all managed resources in the system, monitor changes, and comprehend the intricacies of resource dependencies and interdependencies.
  • Implement automatic data check-pointing. The HA middleware must continuously copy ongoing transaction and application state data to a hot standby resource to enable fast failovers without service interruption or data loss.
  • Detect and diagnose faults. The HA middleware must detect and analyze errors, initiate appropriate recovery actions, and send alert messages as appropriate.
  • Support fast, policy-based recovery. Actions can cover a wide range of activities, ranging from restart of a failed application to failover to a redundant hardware component. The solution must support complex processes with multiple steps and initiate alternative recovery actions as needed.
  • Dynamically manage system components and dependencies. The HA middleware is responsible for initiating the actions and orchestrating the role assignments that maintain continuous availability under changing conditions.
  • Provide administrative access and control. Efficient tools and services are needed to simplify the ongoing maintenance and management of the system.
The HA middleware continuously checks the health of every entity on each blade, including applications, the OS, and hardware entities (network ports, memory, etc.). It also communicates with the shelf manager, which communicates, in turn, with the IPM controller on each blade. The HA middleware can access each controller's event logs to quickly detect potential issues, such as a rising temperature or a failed component.

In a typical fault management cycle, the HA middleware:

  1. Analyzes the available information to determine the nature and location of the fault.
  2. Performs root-cause analysis to identify the origin of the problem.
  3. Isolates the problem to avoid a system failure.
  4. Initiates policy-based actions to restore expected behavior and performs policy based actions.
Two hardware features in an ATCA-compliant system are especially impor-tant in achieving fast failover: the update channel and the shelf manager. The update channel provides fast communications with boards for checking heartbeats and synchronizing application states and data between active and standby boards. To further reduce response times, the shelf manager can use health data to take corrective action directly. This includes generating alerts and e-mails, as well as reconfiguring a warm standby board.

CGL: A Hardened, Standards-Based OS

To enable truly carrier-grade solutions, the OS must be as robust as the hardware and middleware, and also be integrated within the complete HA solution. To address this requirement, the OSDL CGL Working Group has defined additions to Linux that specifically address telecommunications requirements. These extensions have been designed to support ATCA-based solutions and to support standards-based HA middleware interfaces.

Redundancy Support

Redundant boards and rapid failover are critical for ensuring HA in an ATCA-compliant platform. The OS provides two key functions that are important in supporting this capability: Ethernet connectivity and redundant data storage.

In ATCA-compliant boards, Ethernet connectivity is typically provided by redundant Network Interface Controllers (NICs) that are configured to appear as a single virtual interface using a technique called "Ethernet bonding." If one NIC fails, the bonded NIC(s) takes over transparently. Support for Ethernet bonding is a key CGL requirement, and solutions are currently available. This includes the bonding driver developed by the SourceForge "Channel Bonding" project and the Intel Advanced Networking Services Driver (iANS). With these drivers, bonded NICs can be configured in a variety of modes, ranging from simple failover to a standby NIC, to a high-bandwidth "link aggregation" mode, in which all bonded NICs are simultaneously active.

To achieve 99.999% or better availability, it's critical that disk data corruption and disk failures do not lead to system downtime. One way to meet this requirement is by employing a Redundant Array of Inexpensive Disks (RAID) system. The OSDL CGL requires support for software RAID 1 and for disk mirroring, which essentially maintains duplicate sets of all data on separate disk drives. RAID 1 also allows booting from a mirrored drive if the primary disk drive fails. In addition, ATCA-compliant systems require OS support for hot swapping of defective disk drives (or entire storage boards) and for resynching the replacement drive or board after insertion, without having to take down the system. The Linux Multiple Devices driver, along with other user space RAID tools, provides these essential features.

Hot Swap Support

To enable hot swapping of individual devices and entire boards within an ATCA-compliant chassis, the OS must support persistent device naming. To fulfill this need, a CGL-conforming OS provides a policy-based subsystem for the naming of transient components. This allows long-running HA applications to gracefully deal with changing resources, without restarting applications or rebooting the OS.

HA Middleware Support

HA middleware relies on a set of services provided by the OS, including full support for IPMI 1.5 and for the Hardware Platform Interface defined by the Service Availability Forum (SAF). Without IPMI support, solution providers would be forced to depend on costly custom software stacks to create complete HA solutions. A CGL-conforming OS provides an IPMI driver that provides a direct connection to each blade's IPMI controller and can be utilized by software developers. It also provides a framework for enabling system management software, and for reporting critical OS errors to the shelf manager via the IPMB. This enables the shelf manager to take appropriate action quickly to maintain services.

The SAF is an industry consortium of major players in the telecommunication industry. The goal of the SAF is to develop a framework and specifications for service availability, so highly available carrier-grade systems can be developed using off-the-shelf platforms, middleware, and applications. The SAF has published the Hardware Platform Interface, which is the programming interface between the HA middleware and the platform. Members of the forum are working together to create multiple implementations of the specification. A CGL-conforming OS provides a compliant hardware platform interface, so it can interface seamlessly with standardized HA middleware and applications. This enables additional layers of protection to be built on top of IPMI, utilizing multiple communication paths (IPMB, Ethernet, etc.) as needed to meet the extreme demands of carrier environments.

The TCO Advantages of CGL

As an open, standards-based OS, CGL complements the cost advantages of ATCA, enabling a consistent, carrier-grade operating environment that supports a wide range of telecommunications architectures and applications. Multiple vendors offer CGL distributions based on the freely available Linux kernel, so pricing is competitive and developers have considerable flexibility in choosing the distribution that best meets their specific feature and cost requirements.

The extended feature set of CGL-compliant OSs also reduces the need for users to develop and maintain custom solutions. A large number of the CGL requirements have already been implemented as open source projects. Other requirements are a more natural fit for proprietary implementations based on open interfaces, and this is fueling a growing ecosystem of independent software vendors. This combination of software development models is resulting in rapid development and extensive, off-the-shelf options that can support very diverse requirements, while reducing overall costs.

CGL requirements also focus on serviceability, which reduces TCO by ensuring that CGL systems can be maintained, upgraded, and debugged quickly. Support for the Simple Network Management Protocol is one example that can reduce costs by simplifying integration with existing management frameworks. In addition, CGL requirements help to ensure that sufficient information is available in multiple locations (via kernel, system, and application dumps) for debugging system failures.

CGL also includes a variety of features that help to accelerate problem resolution, improve support for security solutions, and enable administrators to debug a live system. In combination with the wide availability of administrators who are familiar with the Linux environment, these tools and capabilities can dramatically reduce both system downtime and total costs.

Summary

To sustain profitability in today's market, telecommunications service providers need solutions that combine HA (99.999%) with a low TCO. In conjunction with CGL and HA Middleware, ATCA provides an open, standards-based, carrier-grade solution that meets these requirements.

Building blocks are available today, including multiple CGL distributions that can be combined with off-the-shelf or custom HA middleware to build comprehensive telecommunications solutions that are more affordable, scalable, and adaptable than traditional, proprietary architectures. These systems can be developed more quickly, and they typically deliver better performance in a smaller footprint. They can also be designed with full redundancy, quick failover mechanisms, and advanced serviceability features that meet the HA requirements of critical telecommunications applications.

About Kirti Devi
Kirti Devi is technical marketing manager in Intel's Communication Infrastructure Group. She has held numerous technical marketing roles over her seven years at Intel. Kirti's current focus is promoting modular communications platforms including AdvancedTCA technology and products. Kirti holds a BS in mathematics, physics, and chemistry and has an MBA in marketing and on in international business.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

  Subscribe to our RSS feeds now and receive the next article instantly!
In It? Reprint It! Contact advertising(at)sys-con.com to order your reprints!
ADS BY GOOGLE
Latest Articles, News & Posts
Gear6, the guys that fielded the first mission-critical Memcached distribution Web 2.0 types use to scale and deliver dynamic applications and content, has turned up with the first commercial Memcached distribution for the cloud. It’s available as a service and will enable cl...
Microsoft has combined its Azure group with its Server & Solutions group to form a Server & Cloud Division (SCD), a pairing of on-premises and cloud solutions inside its Server & Tools Business (STB) where they can share technologies. The move will put Azure in close proximity...
Delegates will leave Virtualization Expo with a full understanding of the interaction between virtual servers and the rest of the data center infrastructure. Indeed our overall aim is to ensure that all attendees leave the Jacob Javits Convention Center with abundant resources, i...
Cloud computing is a game changer. The cloud is disrupting traditional software and hardware business models by disrupting how IT service gets delivered. Entrepreneurial opportunities abound as this classic disruptive technology begins to proliferate, so it is no surprise that SY...
Slashed budgets and reduced staffing numbers delayed many security initiatives in 2009, but the vulnerabilities didn’t retreat and will only intensify in 2010, Unisys security experts predict. Looking ahead to 2010, Unisys predicts that government and commercial organizations wil...
As further proof to the collaboration between both companies, NetApp, the 2009 Microsoft Storage Solutions Partner of the year, utilizes a variety of Windows Server platform technologies to improve storage system management and streamline backup, recovery, and remote replication ...
Subscribe to the World's Most Powerful Newsletters

ADS BY GOOGLE
Today's Top Links You Must Click On !
Microsoft has combined its Azure group with its Server & Solutions group to form a Server & Cloud Di...
Delegates will leave Virtualization Expo with a full understanding of the interaction between virtua...
Cloud computing is a game changer. The cloud is disrupting traditional software and hardware busines...
As further proof to the collaboration between both companies, NetApp, the 2009 Microsoft Storage Sol...
up.time represents a quantum leap forward for companies that need to drive both IT performance and c...
IBM Tuesday is supposed to wheel out some cloudware to monitor, predict and prevent data center outa...
Having been the lone male in my house for a long time, I get outvoted on many things, but especially...
This coming Tuesday, December 8, at 2:00PM EST, SYS-CON.TV will be broadcasting live from its 4th-fl...
After Google ‘s $10 million Friday acquisition of Applejet, Aaron Iba, AppJet’s CEO has announced th...
Cloud Computing, Virtualization, Service Oriented Architecture (SOA), Open Source Software (OSS) and...
Monitis, the leading provider of 100% Cloud-based network, server and application monitoring softwar...
California Company Artifex Software Inc has filed a lawsuit against Palm over the smartphone company...
I am always on the lookout for proof points to my thoughts on Enterprise Mobility and I was happy to...
Oracle’s ownership of MySQL could run off some open source and MySQL users according to a poll the 4...
Following on the heals of being named one of three EcoTech warriors earlier in the year, and then nu...
Layer 7 Technologies, a security and governance company for SOA and Cloud Computing, announced the g...
Ulitzer.com announced today "the World's 30 most influential Virtualization bloggers," who collectiv...
They discussed cloud computing, transparency, and Adobe's approach to this key aspect of the governm...
WSO2, the open source SOA company, today announced the launch of the WSO2 Cloud Platform. Available ...
Intel has canceled Larrabee, its vaunted many-core graphics retort to Nvidia and the ATI side of AMD...