Advertisment

Self-healing, Self-managing Networks-A Networking Nirvana?

author-image
DQI Bureau
New Update

Despite what the poets say, it is business

that makes the world go round. And helping business go round better is the business

communications infrastructure-the network. In a relatively short time, computers and

computer networks have revolutionized the way people work and conduct business. Today,

life virtually comes to a standstill if networks fail. And in the case of mission-critical

environments like hospitals and stock exchanges, a few seconds of network downtime could

have disastrous consequences.

Advertisment

As we progress, networks continue to grow

larger, with new servers and nodes being added every second, and become more complex and

highly sophisticated. Today, staggering volumes of data travel through them. Hence, the

need to manage data residing in these networks and ensure their security. If the data of a

hospital or national defense system network or a power plant or even a local supermarket

gets corrupt, the results are only to be imagined.

In theory, a computer network is easy

enough to understand: You connect any number of single computers together, assign a

''traffic cop'' computer to see that the proper data gets to the proper destination, and sit

back and let the data flow. It''s easy enough until you have to find out why everything

stopped for no apparent reason. This is where network management comes in.

A comprehensive set of tools-network

management applications-is indispensable to a network administrator who needs to ensure

the smooth operation of a network. These applications provide a way to find out what

comprises your network, proactively monitor the network, discover signs of trouble, and

rapidly fix any problems that may arise.

Advertisment

Common Network Management Functions



Most enterprise Network Management Systems (NMSs) of today provide the following basic
functions:

  • Auto Discovery and Topology, where the NMS

    automatically ''discovers'' all the elements or nodes in a network and displays them

    preferably as a graphical map. The map may also show how the elements relate to each

    other, both physically and logically.
  • Configuration management provides a way to

    look at and change the operational parameters of an application, node, or a group of

    nodes.
  • Advertisment
  • Fault management tools to help detect,

    isolate, and correct problems in a network.
  • As the industry has matured, network

    management application providers have made great strides in making network management easy

    and accessible. They have recognized that instead of just detecting a network failure

    after the event, one has to detect a failure before it happens and prevent it. Toward this

    end, there have been efforts in the direction of making the distributed networks of today

    and tomorrow-self-healing and self-managing.

    Some additional functions that an NMS might

    provide include:

      Advertisment
    • Performance Management, which provides

      insight into the performance of the network and offer ways to finetune operational

      parameters for optimal performance.
    • Trending and Capacity Planning to look at

      historical data of network operation and provide graphs and charts showing network traffic

      trends. These trends can then be extrapolated and used to plan network expansion.
    • Security, by which one can restrict user''s

      access to certain resources. This could range from authenticating logins for blanket

      access to assigning different privileges based on the login ID.
    • Advertisment
    • Accounting and Chargeback facilitate

      environments where the cost of running a network is spread across various departments,

      based on their usage of network bandwidth and server resources.
    • The commitment required to realize the

      dream of self-healing, self-managing networks is not a small one. New network management

      applications need to be implemented alongwith a comprehensive management strategy. The

      strategy should include the following major aspects of network management:

      • Policy-based Distributed Intelligence and

        Embedded Automation
      • Advertisment
      • Service and Application Management
      • Any time, anywhere access to Network

        Management
      • Network managers are faced with a

        multi-dimensional problem. They are expected to maintain a high level of service while

        dealing with a growing number of technologies, products from multiple vendors, and

        increasing end-user requirements. Their task is further compounded by having to manage

        highly dynamic and geographically-dispersed networks. The norm is no longer a centralized,

        legacy-based environment, but rather a distributed, client server model.

        Advertisment

        There are a number of basic difficulties

        found in managing complex networks. The continual growth in the number of users and the

        constant reconfiguration problems associated with adding or rearranging nodes is

        immediately problematic. The simultaneous emergence of distributed, high-bandwidth

        applications, such as groupware, multimedia, and videoconferencing is straining the

        capacity of a network. Furthermore, the rapid deployment of new high-bandwidth and

        switched technologies, such as fast Ethernet and Asynchronous Transfer Mode (ATM) and LAN

        switching, is adding another dimension to network management requirements (although ATM,

        and even LAN switching, open the possibility of being able to take greater control of the

        network by controlling the way connections are established).

        The traditional network management approach

        uses intelligent Simple Network Management Protocol (SNMP) agents in the hub to collect

        and reduce data while relying on the NMS for analysis and control. This approach has some

        major limitations. For instance, NMS is a single point of failure, both from an NMS

        hardware and an NMS network connectivity perspective. Also, NMS is overburdened with the

        responsibility of keeping an eye on hundreds or thousands of devices in the network. As a

        result, it cannot really respond in real time to potential problems in the network. Most

        importantly, NMS relies on a human to respond after a problem has already occurred.

        The solution to the above mentioned

        disabilities is not to limit NMS functions to a centralized platform/workstation alone,

        but rather to distribute the proper NMS intelligence throughout a network. The

        administrator only needs to distribute the appropriate policies, which determine the ideal

        behavior of a network, and a network itself will implement the policies in real time,

        locally with the help of the embedded applications, rather than relying on the NMS.

        Let us look at an example of this approach:

        An application that leverages the concept of distributed embedded intelligence to the

        point that network intelligence solves a potential bottleneck before it develops.

        Averting Network Storms



        With the rapid increase in the number of nodes on network segments, there is an
        inevitability of increasing data traffic on ever-decreasing bandwidths of the segments.

        When data lines get ''clogged'', data packet collisions occur. When a misconfigured or a

        faulty node puts out excessive broadcasts causing what is called a ''network storm'', every

        other node in the network (broadcast domain) is busy processing the spurious broadcasts

        and any useful traffic has to wait until the broadcast storm subsides or is terminated.

        The job of detecting and terminating a

        network storm is difficult, even when equipped with advanced tools such as RMON (a remote

        monitoring protocol) probes and intelligent hubs. Typically, it will take an administrator

        2-4 hours to solve a broadcast storm problem. During this time the network may be

        virtually unusable.

        One application that automates detection

        and termination of network storms is NetStorm Terminator. It precludes such occurrences by

        proactively monitoring network traffic, comparing it to predefined baseline thresholds,

        and terminating network storms automatically and quickly (within seconds), without user

        intervention.

        Armed with the proper embedded network

        management applications, a network administrator can maintain the level and quality of

        network service expected, while cutting the administrative overhead required to do so.

        This type of a network management model, where policies are configured centrally from an

        NMS and the implementation is distributed to the intelligent end nodes, is extremely

        scalable. It delivers networks that are self-learning, self-healing, and self-managing.

        Service And Application Management



        Traditionally, network management involves configuring, monitoring, and maintaining a
        collection of physical components. This approach has enabled network managers to diagnose

        and solve hardware problems when they occur and to keep their networks physically

        operational. Users, however, don''t know or care about node or port or link status. They

        don''t know about hubs, switches, or routers.

        The typical user only wants access to

        network resources (application servers such as email, web etc. and file servers) with

        prompt response times. The user wants to get his/her work done without having to think

        about potential network problems. It is with this practical business approach in mind that

        the traditional NMS needs to manage networks, in the context of what really matters:

        non-stop, application-level usability.

        Monitoring Real-time Server

        Response




        Traditionally, server availability was monitored by sending a ''ping'' (verifying that there
        is network connectivity from the source to the destination). The ''ping'' may have been

        successful, but that does not guarantee that the application residing on the server

        machine is alive and well. What is needed is a way to monitor the real-time health of an

        actual application.

        As an example, let''s look at an application

        called VitalStat. VitalStat provides intelligent management at the application level,

        based upon server response time for a typical transaction between a client and a server.

        Given a list of application servers on the

        network (this list could potentially be automatically ''learnt''), VitalStat automatically

        measures the elapsed time for a node to complete a full transaction with an application

        server, for example, downloading a web page (HTML page) from a web server (HTTP server).

        VitalStat correlates actual response time

        to a previously gathered ''baseline'', as well as other performance characteristics, and

        detects deviations from an acceptable performance level. When deviations are detected,

        VitalStat determines whether the cause of the deviation is application, server, or network

        related. The application also makes recommendations on how to fix deviations and prevent

        future occurrences. The network administrator''s intervention or involvement in the

        scenario is very minimal, if at all.

        Virtual Grouping of Users and

        Quality of Service




        In traditional routed networks, users are grouped together based on some physical
        attribute (where they reside, where the network connection is) rather than who they are,

        what they do, or what network services they need to access. Most router management tools

        have a cumbersome box-based approach to management instead of a systems-based approach:

        They concern themselves with the tedious and error-prone tasks of configuring every

        individual router, and each of its parameters, ports, protocols, subnets, filters etc.

        Network managers have a new solution to

        these management constraints-they are now able to identify nodes based on how they use the

        network. Users can be identified and grouped in different ways, such as physical location,

        the network IDs of members, or even the type of network layer protocol or applications

        they use.

        Whenever a user plugs into the network, the

        network (armed with embedded automation) is intelligent enough to determine the

        appropriate group membership based on the characteristics (network ID, protocol,

        applications used etc.).

        Prasad

        Pammidimukkala,




        Product Manager, Newbridge Networks.

        Advertisment