HT Condor - Distributed High Throughput Computing

Project Owner

Andre Verheij

Project Credits

  • Jim Cook
  • George Clemens
  • Mike baker
  • Janusz Tydda
  • Jordan Catling
  • Michael Homsey
  • Murray-Luke Peard
  • Nick Gilbert
  • Andre Verheij

Start Date

15th January 2013

Status

Experimentation completed and terminated May 17th 2013

Problem Statement

Would a distributed platform Like HT Condor be viable in student spaces during downtime. could we distribute load to the student labs when not being used in order to achieve research outcomes. What does this mean for the Environmental impact of power in our labs? How complex is ‘parallelising’ software for these types of platform

Final Brief

We delivered a pilot HT Condor implementation, based on the thesis of Ian Gregory from the Business School. We deployed 43 nodes in the Language learning lab for a period of 3 months during which time we tested various existing applications as well as different types of ‘hello world’ offerings. Several of the tests were existing research applications or systems. Eventually we arrived at the conclusion that the University should pursue a more rounded High Throughput computing strategy, and that the time and effort to implement and maintain a node based cluster was not offset by benefit over traditional HPC solutions. The following reasons were identified as reasons not to continue with the offering.

  • High impact on the life-cycle of machines, especially those with a physical disk.
  • Process to migrate research applications to Multi-threading support is costly and developer intensive. Most existing content was not optimized for this style of computing
  • Environmental impact was high.

Challenges & Learnings

  • C1. Virtualisation of the control node was not possible due to our implementation of Citrix at the time. Scheduling became challenging when multiple projects were enabled on the cluster.
  • L1. We have since greatly improved the Citrix environment, however there is no indication as to whether this would solve the problem of a centralised scheduler.
  • C2. HTC threading is very different to standard ways of using HPC Software, gains are not always made, especially on tasks with large data to process. The Time for our developers and engineers to refactor the code, in most cases outstripped the projected benefit.

Languages / Framework

HT Condor, Ruby on Rails, MatLab, Python, VMWare.

Links to Resources

Link to Thesis



Tags: