Uncategorized News

Self-healing, Self-managing Networks-A Networking Nirvana?

DQI Bureau

28 Feb 1998 00:00 IST

New Update

Despite what the poets say, it is business

that makes the world go round. And helping business go round better is the business

communications infrastructure-the network. In a relatively short time, computers and

computer networks have revolutionized the way people work and conduct business. Today,

life virtually comes to a standstill if networks fail. And in the case of mission-critical

environments like hospitals and stock exchanges, a few seconds of network downtime could

have disastrous consequences.

Advertisment

As we progress, networks continue to grow

larger, with new servers and nodes being added every second, and become more complex and

highly sophisticated. Today, staggering volumes of data travel through them. Hence, the

need to manage data residing in these networks and ensure their security. If the data of a

hospital or national defense system network or a power plant or even a local supermarket

gets corrupt, the results are only to be imagined.

In theory, a computer network is easy

enough to understand: You connect any number of single computers together, assign a

''traffic cop'' computer to see that the proper data gets to the proper destination, and sit

back and let the data flow. It''s easy enough until you have to find out why everything

stopped for no apparent reason. This is where network management comes in.

A comprehensive set of tools-network

management applications-is indispensable to a network administrator who needs to ensure

the smooth operation of a network. These applications provide a way to find out what

comprises your network, proactively monitor the network, discover signs of trouble, and

rapidly fix any problems that may arise.

Advertisment

Common Network Management Functions

Most enterprise Network Management Systems (NMSs) of today provide the following basic
functions:

Auto Discovery and Topology, where the NMS

automatically ''discovers'' all the elements or nodes in a network and displays them

preferably as a graphical map. The map may also show how the elements relate to each

other, both physically and logically.

Configuration management provides a way to

look at and change the operational parameters of an application, node, or a group of

nodes.

Advertisment

Fault management tools to help detect,

isolate, and correct problems in a network.

As the industry has matured, network

management application providers have made great strides in making network management easy

and accessible. They have recognized that instead of just detecting a network failure

after the event, one has to detect a failure before it happens and prevent it. Toward this

end, there have been efforts in the direction of making the distributed networks of today

and tomorrow-self-healing and self-managing.

Some additional functions that an NMS might

provide include:

Advertisment

Performance Management, which provides

insight into the performance of the network and offer ways to finetune operational

parameters for optimal performance.

Trending and Capacity Planning to look at

historical data of network operation and provide graphs and charts showing network traffic

trends. These trends can then be extrapolated and used to plan network expansion.

Security, by which one can restrict user''s

access to certain resources. This could range from authenticating logins for blanket

access to assigning different privileges based on the login ID.

Advertisment

Accounting and Chargeback facilitate

environments where the cost of running a network is spread across various departments,

based on their usage of network bandwidth and server resources.

The commitment required to realize the

dream of self-healing, self-managing networks is not a small one. New network management

applications need to be implemented alongwith a comprehensive management strategy. The

strategy should include the following major aspects of network management:

Policy-based Distributed Intelligence and

Embedded Automation

Advertisment

Service and Application Management

Any time, anywhere access to Network

Management

Network managers are faced with a

multi-dimensional problem. They are expected to maintain a high level of service while

dealing with a growing number of technologies, products from multiple vendors, and

increasing end-user requirements. Their task is further compounded by having to manage

highly dynamic and geographically-dispersed networks. The norm is no longer a centralized,

legacy-based environment, but rather a distributed, client server model.

Advertisment

There are a number of basic difficulties

found in managing complex networks. The continual growth in the number of users and the

constant reconfiguration problems associated with adding or rearranging nodes is

immediately problematic. The simultaneous emergence of distributed, high-bandwidth

applications, such as groupware, multimedia, and videoconferencing is straining the

capacity of a network. Furthermore, the rapid deployment of new high-bandwidth and

switched technologies, such as fast Ethernet and Asynchronous Transfer Mode (ATM) and LAN

switching, is adding another dimension to network management requirements (although ATM,

and even LAN switching, open the possibility of being able to take greater control of the

network by controlling the way connections are established).

The traditional network management approach

uses intelligent Simple Network Management Protocol (SNMP) agents in the hub to collect

and reduce data while relying on the NMS for analysis and control. This approach has some

major limitations. For instance, NMS is a single point of failure, both from an NMS

hardware and an NMS network connectivity perspective. Also, NMS is overburdened with the

responsibility of keeping an eye on hundreds or thousands of devices in the network. As a

result, it cannot really respond in real time to potential problems in the network. Most

importantly, NMS relies on a human to respond after a problem has already occurred.

The solution to the above mentioned

disabilities is not to limit NMS functions to a centralized platform/workstation alone,

but rather to distribute the proper NMS intelligence throughout a network. The

administrator only needs to distribute the appropriate policies, which determine the ideal

behavior of a network, and a network itself will implement the policies in real time,

locally with the help of the embedded applications, rather than relying on the NMS.

Let us look at an example of this approach:

An application that leverages the concept of distributed embedded intelligence to the

point that network intelligence solves a potential bottleneck before it develops.

Averting Network Storms

With the rapid increase in the number of nodes on network segments, there is an
inevitability of increasing data traffic on ever-decreasing bandwidths of the segments.

When data lines get ''clogged'', data packet collisions occur. When a misconfigured or a

faulty node puts out excessive broadcasts causing what is called a ''network storm'', every

other node in the network (broadcast domain) is busy processing the spurious broadcasts

and any useful traffic has to wait until the broadcast storm subsides or is terminated.

The job of detecting and terminating a

network storm is difficult, even when equipped with advanced tools such as RMON (a remote

monitoring protocol) probes and intelligent hubs. Typically, it will take an administrator

2-4 hours to solve a broadcast storm problem. During this time the network may be

virtually unusable.

One application that automates detection

and termination of network storms is NetStorm Terminator. It precludes such occurrences by

proactively monitoring network traffic, comparing it to predefined baseline thresholds,

and terminating network storms automatically and quickly (within seconds), without user

intervention.

Armed with the proper embedded network

management applications, a network administrator can maintain the level and quality of

network service expected, while cutting the administrative overhead required to do so.

This type of a network management model, where policies are configured centrally from an

NMS and the implementation is distributed to the intelligent end nodes, is extremely

scalable. It delivers networks that are self-learning, self-healing, and self-managing.

Service And Application Management

Traditionally, network management involves configuring, monitoring, and maintaining a
collection of physical components. This approach has enabled network managers to diagnose

and solve hardware problems when they occur and to keep their networks physically

operational. Users, however, don''t know or care about node or port or link status. They

don''t know about hubs, switches, or routers.

The typical user only wants access to

network resources (application servers such as email, web etc. and file servers) with

prompt response times. The user wants to get his/her work done without having to think

about potential network problems. It is with this practical business approach in mind that

the traditional NMS needs to manage networks, in the context of what really matters:

non-stop, application-level usability.

Monitoring Real-time Server

Response

Traditionally, server availability was monitored by sending a ''ping'' (verifying that there
is network connectivity from the source to the destination). The ''ping'' may have been

successful, but that does not guarantee that the application residing on the server

machine is alive and well. What is needed is a way to monitor the real-time health of an

actual application.

As an example, let''s look at an application

called VitalStat. VitalStat provides intelligent management at the application level,

based upon server response time for a typical transaction between a client and a server.

Given a list of application servers on the

network (this list could potentially be automatically ''learnt''), VitalStat automatically

measures the elapsed time for a node to complete a full transaction with an application

server, for example, downloading a web page (HTML page) from a web server (HTTP server).

VitalStat correlates actual response time

to a previously gathered ''baseline'', as well as other performance characteristics, and

detects deviations from an acceptable performance level. When deviations are detected,

VitalStat determines whether the cause of the deviation is application, server, or network

related. The application also makes recommendations on how to fix deviations and prevent

future occurrences. The network administrator''s intervention or involvement in the

scenario is very minimal, if at all.

Virtual Grouping of Users and

Quality of Service

In traditional routed networks, users are grouped together based on some physical
attribute (where they reside, where the network connection is) rather than who they are,

what they do, or what network services they need to access. Most router management tools

have a cumbersome box-based approach to management instead of a systems-based approach:

They concern themselves with the tedious and error-prone tasks of configuring every

individual router, and each of its parameters, ports, protocols, subnets, filters etc.

Network managers have a new solution to

these management constraints-they are now able to identify nodes based on how they use the

network. Users can be identified and grouped in different ways, such as physical location,

the network IDs of members, or even the type of network layer protocol or applications

they use.

Whenever a user plugs into the network, the

network (armed with embedded automation) is intelligent enough to determine the

appropriate group membership based on the characteristics (network ID, protocol,

applications used etc.).

Prasad

Pammidimukkala,

Product Manager, Newbridge Networks.

Advertisment