Using the Cluster

In bioinformatics, jobs require enormous amounts of data transfer. Linking computers together to create clusters has proven to be the most efficient means to accomplish this task. Through memory-intensive processing at incredibly high speeds, our lab can comb databases throughout the world and organize and store extremely specified biological information. Also, the driving power of multiple cores makes possible the ability to run jobs in parallel.  Operations that might take months to process on conventional computers can be completed in a matter of hours.

The cluster is an amazing tool allowing multiple users to tackle multiple tasks with minimal impediment or interference. Despite its considerable power, even the cluster has limitations to the amount of data that it can process. The scheduler and resource manager maximize efficiency through priority, but users must also take measures to conserve cluster resources, as the following section demonstrates.

Converting to a Compute Node Saves System Resources:

Upon entering your username and password, you initially interface with the login node. This head node serves as the entry point for all operations performed on the cluster. However, using the login node for resource-intensive tasks can potentially interfere with the cluster’s efficiency by taking away limited resources from others. It is therefore recommended to access a compute node for the brunt of operations performed on the cluster. 

Compute nodes are generally used for working in the interactive mode to run smaller jobs or to refine the individual components of larger ones. Not only do compute nodes free up the system from unnecessary drag, they also enable failovers (redundancy safeguards) which protect against data loss and job termination. Simply put, compute nodes occupy considerably less bandwidth than login nodes. It is therefore recommended that users bypass the login node for all situations except those requiring massive data transfer or downloads from remote servers.