Self-healing, Self-managing Networks-A Networking Nirvana?

author-image
DQI Bureau
New Update

Despite what the poets say, it is business
that makes the world go round. And helping business go round better is the business
communications infrastructure-the network. In a relatively short time, computers and
computer networks have revolutionized the way people work and conduct business. Today,
life virtually comes to a standstill if networks fail. And in the case of mission-critical
environments like hospitals and stock exchanges, a few seconds of network downtime could
have disastrous consequences.

Advertisment

As we progress, networks continue to grow
larger, with new servers and nodes being added every second, and become more complex and
highly sophisticated. Today, staggering volumes of data travel through them. Hence, the
need to manage data residing in these networks and ensure their security. If the data of a
hospital or national defense system network or a power plant or even a local supermarket
gets corrupt, the results are only to be imagined.

In theory, a computer network is easy
enough to understand: You connect any number of single computers together, assign a
''traffic cop'' computer to see that the proper data gets to the proper destination, and sit
back and let the data flow. It''s easy enough until you have to find out why everything
stopped for no apparent reason. This is where network management comes in.

A comprehensive set of tools-network
management applications-is indispensable to a network administrator who needs to ensure
the smooth operation of a network. These applications provide a way to find out what
comprises your network, proactively monitor the network, discover signs of trouble, and
rapidly fix any problems that may arise.

Advertisment

Common Network Management Functions

Most enterprise Network Management Systems (NMSs) of today provide the following basic
functions:

  • Auto Discovery and Topology, where the NMS
    automatically ''discovers'' all the elements or nodes in a network and displays them
    preferably as a graphical map. The map may also show how the elements relate to each
    other, both physically and logically.
  • Configuration management provides a way to
    look at and change the operational parameters of an application, node, or a group of
    nodes.
  • Advertisment
  • Fault management tools to help detect,
    isolate, and correct problems in a network.
  • As the industry has matured, network
    management application providers have made great strides in making network management easy
    and accessible. They have recognized that instead of just detecting a network failure
    after the event, one has to detect a failure before it happens and prevent it. Toward this
    end, there have been efforts in the direction of making the distributed networks of today
    and tomorrow-self-healing and self-managing.

    Some additional functions that an NMS might
    provide include:

      Advertisment
    • Performance Management, which provides
      insight into the performance of the network and offer ways to finetune operational
      parameters for optimal performance.
    • Trending and Capacity Planning to look at
      historical data of network operation and provide graphs and charts showing network traffic
      trends. These trends can then be extrapolated and used to plan network expansion.
    • Security, by which one can restrict user''s
      access to certain resources. This could range from authenticating logins for blanket
      access to assigning different privileges based on the login ID.
    • Advertisment
    • Accounting and Chargeback facilitate
      environments where the cost of running a network is spread across various departments,
      based on their usage of network bandwidth and server resources.
    • The commitment required to realize the
      dream of self-healing, self-managing networks is not a small one. New network management
      applications need to be implemented alongwith a comprehensive management strategy. The
      strategy should include the following major aspects of network management:

      • Policy-based Distributed Intelligence and
        Embedded Automation
      • Advertisment
      • Service and Application Management
      • Any time, anywhere access to Network
        Management
      • Network managers are faced with a
        multi-dimensional problem. They are expected to maintain a high level of service while
        dealing with a growing number of technologies, products from multiple vendors, and
        increasing end-user requirements. Their task is further compounded by having to manage
        highly dynamic and geographically-dispersed networks. The norm is no longer a centralized,
        legacy-based environment, but rather a distributed, client server model.

        Advertisment

        There are a number of basic difficulties
        found in managing complex networks. The continual growth in the number of users and the
        constant reconfiguration problems associated with adding or rearranging nodes is
        immediately problematic. The simultaneous emergence of distributed, high-bandwidth
        applications, such as groupware, multimedia, and videoconferencing is straining the
        capacity of a network. Furthermore, the rapid deployment of new high-bandwidth and
        switched technologies, such as fast Ethernet and Asynchronous Transfer Mode (ATM) and LAN
        switching, is adding another dimension to network management requirements (although ATM,
        and even LAN switching, open the possibility of being able to take greater control of the
        network by controlling the way connections are established).

        The traditional network management approach
        uses intelligent Simple Network Management Protocol (SNMP) agents in the hub to collect
        and reduce data while relying on the NMS for analysis and control. This approach has some
        major limitations. For instance, NMS is a single point of failure, both from an NMS
        hardware and an NMS network connectivity perspective. Also, NMS is overburdened with the
        responsibility of keeping an eye on hundreds or thousands of devices in the network. As a
        result, it cannot really respond in real time to potential problems in the network. Most
        importantly, NMS relies on a human to respond after a problem has already occurred.

        The solution to the above mentioned
        disabilities is not to limit NMS functions to a centralized platform/workstation alone,
        but rather to distribute the proper NMS intelligence throughout a network. The
        administrator only needs to distribute the appropriate policies, which determine the ideal
        behavior of a network, and a network itself will implement the policies in real time,
        locally with the help of the embedded applications, rather than relying on the NMS.

        Let us look at an example of this approach:
        An application that leverages the concept of distributed embedded intelligence to the
        point that network intelligence solves a potential bottleneck before it develops.

        Averting Network Storms

        With the rapid increase in the number of nodes on network segments, there is an
        inevitability of increasing data traffic on ever-decreasing bandwidths of the segments.
        When data lines get ''clogged'', data packet collisions occur. When a misconfigured or a
        faulty node puts out excessive broadcasts causing what is called a ''network storm'', every
        other node in the network (broadcast domain) is busy processing the spurious broadcasts
        and any useful traffic has to wait until the broadcast storm subsides or is terminated.

        The job of detecting and terminating a
        network storm is difficult, even when equipped with advanced tools such as RMON (a remote
        monitoring protocol) probes and intelligent hubs. Typically, it will take an administrator
        2-4 hours to solve a broadcast storm problem. During this time the network may be
        virtually unusable.

        One application that automates detection
        and termination of network storms is NetStorm Terminator. It precludes such occurrences by
        proactively monitoring network traffic, comparing it to predefined baseline thresholds,
        and terminating network storms automatically and quickly (within seconds), without user
        intervention.

        Armed with the proper embedded network
        management applications, a network administrator can maintain the level and quality of
        network service expected, while cutting the administrative overhead required to do so.
        This type of a network management model, where policies are configured centrally from an
        NMS and the implementation is distributed to the intelligent end nodes, is extremely
        scalable. It delivers networks that are self-learning, self-healing, and self-managing.

        Service And Application Management

        Traditionally, network management involves configuring, monitoring, and maintaining a
        collection of physical components. This approach has enabled network managers to diagnose
        and solve hardware problems when they occur and to keep their networks physically
        operational. Users, however, don''t know or care about node or port or link status. They
        don''t know about hubs, switches, or routers.

        The typical user only wants access to
        network resources (application servers such as email, web etc. and file servers) with
        prompt response times. The user wants to get his/her work done without having to think
        about potential network problems. It is with this practical business approach in mind that
        the traditional NMS needs to manage networks, in the context of what really matters:
        non-stop, application-level usability.

        Monitoring Real-time Server
        Response


        Traditionally, server availability was monitored by sending a ''ping'' (verifying that there
        is network connectivity from the source to the destination). The ''ping'' may have been
        successful, but that does not guarantee that the application residing on the server
        machine is alive and well. What is needed is a way to monitor the real-time health of an
        actual application.

        As an example, let''s look at an application
        called VitalStat. VitalStat provides intelligent management at the application level,
        based upon server response time for a typical transaction between a client and a server.

        Given a list of application servers on the
        network (this list could potentially be automatically ''learnt''), VitalStat automatically
        measures the elapsed time for a node to complete a full transaction with an application
        server, for example, downloading a web page (HTML page) from a web server (HTTP server).

        VitalStat correlates actual response time
        to a previously gathered ''baseline'', as well as other performance characteristics, and
        detects deviations from an acceptable performance level. When deviations are detected,
        VitalStat determines whether the cause of the deviation is application, server, or network
        related. The application also makes recommendations on how to fix deviations and prevent
        future occurrences. The network administrator''s intervention or involvement in the
        scenario is very minimal, if at all.

        Virtual Grouping of Users and
        Quality of Service


        In traditional routed networks, users are grouped together based on some physical
        attribute (where they reside, where the network connection is) rather than who they are,
        what they do, or what network services they need to access. Most router management tools
        have a cumbersome box-based approach to management instead of a systems-based approach:
        They concern themselves with the tedious and error-prone tasks of configuring every
        individual router, and each of its parameters, ports, protocols, subnets, filters etc.

        Network managers have a new solution to
        these management constraints-they are now able to identify nodes based on how they use the
        network. Users can be identified and grouped in different ways, such as physical location,
        the network IDs of members, or even the type of network layer protocol or applications
        they use.

        Whenever a user plugs into the network, the
        network (armed with embedded automation) is intelligent enough to determine the
        appropriate group membership based on the characteristics (network ID, protocol,
        applications used etc.).

        Prasad
        Pammidimukkala,


        Product Manager, Newbridge Networks.