Moving Data to and from the Cluster

Although the Halcyon cluster is stocked with a versatile database, ongoing developments in the field of Bioinformatics continue to require additional methods and tools. Therefore, users are oftentimes required to update and expand their data. This can be done using the login and compute nodes.

Downloading data using wget:

The compute node is used to download data from a remote server onto the cluster through a retrieval utility such as wget. Wget’s compatibility with http, https, and ftp protocols makes it the most common choice. Git and svn (subversion) can also be used, but they must be loaded over an http. To download data using wget, first make sure that you are logged into the cluster (ssh/password). Then retrieve data from the Web by typing the following into the command line:

$ wget http://website/file

Wget will pull the file from a remote server into your directory. To verify that the file has downloaded, do the following:

$ ls     # list the directory contents.


Uploading data using rsync:

The login node is used to pull data from a local drive (such as a file on a UNCC approved laptop) and then push it to the cluster. For example, information can be pulled from a flash drive or C drive. This is done by synchronizing the drive with the login node using an application such as rsync (preferred), scp, or sftp. 

Uploading a file from the local computer:

  1. Access Terminal but do not log in to the shell prompt. If you are already logged in type $ exit. This will bring you out of the secure shell so that your login name changes back to alpha, bravo, etc.
  2. Change your directory ($ cd) to the location of the file you intend to download or type $ cd and then drag the file into Terminal.
  3. Sync the file to your cluster login:
$ rsync -aP file username@halcyon:.  # The final token (.) is a symbol that defines the relative path of the file transfer.

Adding the -a and -P (-aP) options are recommended when using rsync:

-a  # archive mode (preserves the permissions, attributes, etc. of a directory or file).

-P  # preserves permissions and executability of a directory or file.

To double-check that the file has been pushed to the cluster:

  1. Sign into the secure shell (ssh and password).
  2. Check to see that the file is in your user directory ($ ls).  

Note: The rsync process is the same for any file on your computer. Remember to always rsync from the directory that contains the file you are transferring to the cluster (/Users/username/Desktop, /Users/username/Downloads, etc.).

Note: Please avoid uploading files with spaces.  They can break the cluster.