Rutgers, The State University of New Jersey http://www.rutgers.edu http://www.camden.rutgers.edu/ http://www.newark.rutgers.edu/ http://nb.rutgers.edu/ http://search.rutgers.edu/
OFFICE OF INFORMATION TECHNOLOGY | OFFICE OF INSTRUCTIONAL AND RESEARCH TECHNOLOGY

HPC - CLUSTER COMPUTING
OIRT TECHNOLOGY MEETING

The OIRT March Technology Meeting focused on one aspect of High Performance Computing -- Cluster Computing. The evolution of Beowulf-style computing clusters as less expensive yet powerful alternatives to high-end machines/supercomputers has been dramatic. Several groups at Rutgers operate clusters in support of their research goals. Planning is currently underway for OIRT to provide a level of support for promoting this important technology at Rutgers. This meeting served as an opportunity to learn what others are doing with respect to cluster computing as well plan for what HPC services need to be developed to support growth and use of this important technology.

Date: 01 March 05

Announcements
  • Sun Microsystems Meeting
    An April meeting with Sun Microsystems is in the works. If there is a particular topic you wish to have discussed, please send suggestions to Tom Grzelak.

Introduction
  • Tom Grzelak - OIRT
    Tom began the discussion by presenting a graph from the recent issue of CTWatch Quarterly. The graph can be found at http://www.ctwatch.org/quarterly/articles/2005/02/recent-trends/2/ . The graph shows the architecture of the top 500 supercomputers for the past dozen years. The graph clearly shows that in the past three years, cluster computing using commodity machines has become the dominant architecture of the world's fastest supercomputers. One of the real benefits of this technology is that it scales from a few to hundreds of PCs. Therefore, the dominant supercomputing architecture is within the technological and fiscal reach of individual research groups at Rutgers. In fact, nearly 20 research groups operating clusters at the university. The meeting began with presentations of a few groups that use cluster computing to support research.

Rutgers Groups using Cluster Computing
  • Alexei Kotelnikov - School of Engineering
    Alexei gave a presentation of the history of cluster computing within the School of Engineering from the original interdepartmental late 90's Rutgers Computational Grid to SOE's operational clusters of today. His presentation can be found at http://linuxcourse.rutgers.edu/documents/cluster_computing/start.html. Alexei's presentation details their clusters' hardware and software currently in use.

  • Viktor Oudovenko - Department of Physics and Astronomy
    Viktor presented Physic's current computing clusters (numbering over 300 machines) and how they are employed. Physics employs many different hardware combinations in its clustering (single and dual CPU machines; Pentium III, Athlon, Opteron CPUs). They uses both ethernet and myrinet for networking between nodes. At one point, their cluster was ranked amongst the world's top500 supercomputing sites. Viktor's presentation is available online at http://beowulf.rutgers.edu /talks/2005_03_01/Rutgers_cluster_OIRT.htm.

  • Emilio Gallicchio - Department of Chemistry and Chemistry Biology
    Emilio described the evolution of their clusters over the past 8 years and how they are employed within their research. They have a mix of processors (Pentium III, Alpha, Athlon). Emilio also described their ambitious plans for an 800-node cluster based upon 400 dual-CPU Opteron machines. Emilio's presentation can be found at http://oirt.rutgers.edu/docs/E_Gallicchio-hpc.pdf.

Cluster Components in use at RU
  • Hardware Vendors
    Microway (SOE, Chemistry)
    Dell (Several Groups: e.g., Env Sciences)
    CMC (Physics (custom-built))
    Penguin (IMCS)

  • Operating Systems
    Debian (SOE)
    Red Hat (Env Sciences, SOE, others)
    Suse (Physics)

  • Compilers
    gcc
    PGI
    Intel

  • Queueing
    Sun Grid Engine (gridengine.sunsource.net)
    Load Sharing Facility-LSF (LINK)
    Condor (LINK)

Future Support Issues
  • Facilities
    Adequate power and A/C are serious issues in developing some high-end HPC clusters. Retro-fitting older buildings may not be possible or the most cost-effective approach. Further investigation on a central or shared facility should be conducted.

  • Hardware
    - A number of groups expressed the need to develop expertise in high performance file systems to support clusters.
    - The development of a 32-bit test cluster and making it available to on-campus groups to test OS, configuration and software would be beneficial.
    - A 64-bit large memory test cluster may have limited utility as use of 64-bit processors is not widespread on campus.

  • Software
    - There is a need for shared testing, use and development of utility programs such as monitoring software and debuggers.


For questions or comments about this site, contact oirt@rutgers.edu
Last updated: Wednesday, 11-Oct-2006 10:51:17 EDT

© 2006 Rutgers, The State University of New Jersey. All rights reserved.