Clustering
Clustering is an ingenuous way to use multiple personal computers or workstations to create the illusion of a single highly available system. Clustering is a technology to create high computations, as much as that of a mainframe, with the shear power contributed from connected personal computers, servers, or workstations. There are several things which must be present to create a clustering of processors. The primary need is a cluster-aware operating system. Most modern operating systems have this capacity, including: Windows, IIS, Linux-based, Solaris, Apache, etcetera. The second necessity of a Clustering of processors is an interconnection between the servers for messages, data, and application coordination information. Lastly, to enable the interconnected processors is clustering software which acts as the control unit. (Sheldon, 2001) Clustering technology normally provides better performance than large symmetric multi-processing servers, because multiple systems provide better input and output capacities when using for a large number of network clients.
There are many types of Clusters, the main implementations are (Stanton, 2003) CHPC (Beowulf, MPI), Load-balancing (Mosix, Condor), Web-service (LVS, Cisco Local Director), Storage (GFS, Lustre, SAN), Database (Oracle, IBM DB2), and High Availability (Heartbeat). (Sun Microsystems, 2003) The benefits of Clustering are numerous, they can be summarized by: performance, disaster safe guard, and the appearance of a single system; because to appear to be a single system it will require different levels of abstraction to hide the fact that they are made up of multiple machines. Clustering is excellent for high availability a service, meaning the system continuously operates for long amounts of time. Clustering also can be used for what is called load balancing. (Tech Target, 2003)Load balancing is the process of dividing the work load to more than one computer device, so computations are faster. A typical example of load balancing would be an Internet site which receives a great deal of traffic. A load balancing system for this example could route each request to a different server, as in round-robin scheduling, within the host address of a domain name system table. Usually, for multiple servers, one server is used as the controller and the others are responsible for the request work. In many instances, especially those of a large distributed nature, there is a server which is used for backup in case of disaster.
(Tech Target, 2003) Clustering can also be implemented as a low-cost form of parallel processing for scientific and other applications that lend themselves to parallel operations. Not necessarily parallel computers, which are computers with multiple CPU's, (distributed.net, 2003). but computers which are linked together with LAN technology through an Ethernet, or through a WAN such as distributed.net.
In 1994, a project called Beowulf was implemented by Donald Becker and Thomas Sterling. The Beowulf is not a product, but a concept for clustering various numbers of inexpensive computers running the Linux operating system. The goal was to create a parallel processing super computer environment, at a price well below that of a super computer. Initially, (http://www.beowulf.org/beowulf/history.html) the cluster consisted of 16 DX4 processors connected by channel bonded Ethernet; ideal for a low cost super computing through a LAN for entities such as a college campus or business park.
An excellent example of parallel processing through a WAN, are the implementations of distributed.net. (distributed.net, 2003) This company connects thousands of users throughout the world via the Internet, and uses this cluster of computing devices to solve extremely complex and unremitting mathematical computations. The system works by a single user downloading a small program which utilizes idle CPU utilization during down time as well as act as a light weight screen saver.
Business needs are very strong for Clustering; most modern operating systems are configured to handle Clustering. Consequently, most major manufacturers and distributors of servers and accompanied operating systems market this system. (IBM, 2003) IBM has developed software called X-Architecture which brings reliability and availability of there mainframe servers to the xSeries family of products. This product is a clustering solution, meaning it has the capabilities to tie together multiple technologies from various proprietors. On the hardware side, IBM (IBM, 2003) has the e-server Cluster 1350. It is described as combining "xSeries rack-optimized servers running the Linux operating delivering highly scalable, integrated cluster offerings."
Sun Microsystems (Sun Microsystems, 2003) offers a software application which specializes cluster interconnect for Sun Cluster, called the Scalable Coherent Interconnect. Its strongest characteristics are that it has a latency of only 10 microseconds and that it is compatible with Sun Servers, such as the Sun Fire 6800. Latency is the time it takes for a packet to cross a network connection, from sender to receiver. And these connections are provided by the Sun Fire Link.
The Windows Server 2003 family provides two types of clustering services: Cluster Service and Network Load Balancing. Network Load Balancing clusters will the scaling of applications. This allows the application to begin small and to grow as demand increases. (Microsoft, 2003) Network Load Balancing enhances both the availability and scalability of Internet server-based programs such as Web servers, streaming media servers, and terminal type systems. The Cluster Service provides high availability and scalability for mission-critical applications such as databases, messaging systems, and file and print services.
The main clustering offering from Hewlett Packard (HP) is the Open VMS Cluster Software. (HP, 2003) It is geared mostly towards e-business and can implement the various HP servers: PA-RISC, AlphaServer, and ProLiant servers, respectively. The strongest characteristics with the Open VMS Cluster Software running on the Compaq TRU64 Unix are that it allows clustering to an Oracle 9i database and (HP, 2000) that it is based upon the builds from the Beowulf cluster concept.
As mentioned above, the advantages or reasons to use Clustering is for performance, disaster safe guard, and the appearance of a single system entity. The disadvantages are important as well. (Stanton, 2003) Clustering can be difficult when applying to heterogeneous clusters. What that means is that it is difficult to gain seamless performance when using various hardware and software configurations. Additionally, the complexity of implementation will most likely be extremely difficult to code and maintain. There is a tendency for there to be problems associated with hidden information from applications as well as hidden overhead associated with latency.
As with most topics relating to Computer Science, I gain a strong fascination with the subject. This research paper, Clustering, gave me the feelings that I should have gone a bit farther in my research, but felt that my initial objections were satisfied. Being in Web Development for the past several years, in a small business, has made me wonder what is needed to fulfill a high volume e-commerce site. So this research has been very interesting and enlightening. For the requirements of this assignment, I trimmed down a lot of technical information, but garnered much more in return. This sort of research has given me a clear picture of enterprise applications through a distributed computing point of view and should help me choose and respond in my future career.
Bibliography
Sheldon, T. (2001) Corporate Website: Tom Sheldon's Linktionary (2001). Clustering. Retrieved: November 30, 2003 from http://www.linktionary.com/c/clustering.html. Big Sur MultimediaStanton, J. (2003). Single System Image and Process Migration. Retrieved: November 27, 2003 from www.seas.gwu.edu/~jstanton/courses/cs251/lectures/week7-2003-ssi.pdf
Sun Microsystems (2003). Corporate Website: Solaris 9 Operating System. Retrieved: November 29, 2003 from http://wwws.sun.com/software/solaris/
Sun Microsystems (2003). Corporate Website: Cluster Interconnects. Retrieved: November 29, 2003 from http://www.sun.com/servers/cluster_interconnects/)
Tech Target (2003). Corporate Website Whatis.com: Load Balancing. Retrieved: November 26, 2003 from http://whatis.techtarget.com/definition/0,,sid9_gci214490,00.html
distributed.net (October 6, 2003). Corporate Website: distributed.net. Retrieved: December 1, 2003 from http://distributed.net/
IBM (2003). Corporate Website, IBM: xSeries Clustering. Retrieved: November 29, 2003 from http://www.pc.ibm.com/ww/eserver/xseries/clustering/
Microsoft (2003). Corporate Website, Microsoft: Windows 2000 Clustering Technologies. Retrieved: December 1, 2003 from http://www.microsoft.com/windows2000/technologies/clustering/default.asp
HP (2003). Corporate Website, Hewlett Packard Company: HP clustered systems for Oracle9i Real Applications Clusters. Retrieved: November 28, 2003 from http://h30097.www3.hp.com/oracle9irac/
HP (February 23, 2000). Corporate Website, Hewlett Packard Company: Considerations in Specifying Beowulf Clusters. Retrieved: December 1, 2003 from http://zdnet.com.com/2100-1104-956405.html
algorithms | dbms | html | j2ee | mis | networking | os | se | more...