Standard Practices¶
Efficient and effective use of high-performance computing (HPC) resources requires adherence to established standard practices that optimize performance, ensure fair resource allocation, and minimize disruptions for all users. Following these guidelines will lead to more reliable job execution, better system performance, and improved collaboration within the HPC community. Whether you are a new user or an experienced researcher, incorporating these standard practices into your workflow will enhance your overall computing experience.
Please refer to our page on Acceptable Use for guidelines related to controlled data and federal regulations.
While all of the items below are imporant to cultivating a smooth HPC experience, critically important issues are highlighted.
Basic Skills¶
We don't expect users to be experts in HPC before they start, so if you're not yet comfortable working on the command line or even don't know what HPC is, know that you are still welcome! We encourage active learning while you use HPC resources.
-
Computer Literacy: HPC users should have a basic degree of computer literacy, or be actively learning it. This includes understanding the basics of files and folders, familiarity with basic hardware terms like processor and drive, and other common usages.
-
Linux Literacy: You don’t need to be a Linux expert to start using HPC, but building this skill set will help you get the most out of our system. This includes tasks like managing files, submitting jobs, and troubleshooting common issues. If you're new to Linux, consider starting with some introductory resources to build confidence as you go.
-
Programming Skills: Some HPC workflows require minimal coding, while others can be code-intensive. These skills are often highly specific to a particular domain of research, making across-the-board support difficult. The general expectation is that users will receive support on domain-specific programming knowledge from their departments, collaborators, and mentors.
Our support team is available to assist with HPC-related questions, troubleshoot issues, and guide users toward solutions. However, we are unable to provide comprehensive instruction on foundational skills like Linux, general computer literacy, or programming. To make the most of our support, we encourage users to actively seek and utilize training resources (including workshops we offer and online tutorials/courses) to build skills over time.
Proper Usage of Nodes¶
Our HPC system has a variety of different node types. It is expected that users are aware of the different types of nodes and utilize them properly.
-
Login Nodes: The login nodes (named
junonia
andwentletrap
) are the default landing place on the HPC after the bastion host. These nodes have access to the main HPC filesystem and can be used for accessing and managing files, including creating personal software environments. Do not run or compile code on the login nodes. Instead, use the compute nodes for these tasks. See Running Jobs for extensive details on how to properly run computations. -
Bastion Host: The bastion host (hostname
gatekeeper
) is a security feature used to verify credentials, and nothing else. This node is not to be used for storage, running jobs, or any task other than accessing the login nodes. -
Compute Nodes: These are the main computational engines of the HPC. You can run computationally intensive tasks on these nodes, including running code via interactive or batch jobs, and compiling code.
-
High Memory Nodes: There are a small number of nodes with increased memory. Please respect that they are a limited resource in high demand. If your job is running out of memory, first increase the memory allocated on a standard node. Only after you have thoroughly tested your job and ruled out the possibility of using a standard node should you submit it to the high memory nodes.
-
GPU Nodes: There are a small number of nodes with GPUs available. Please only submit jobs through the GPU partitions if they are properly configured to utilize GPUs.
-
File Transfer Nodes: Also known as Data Transfer Nodes, or DTNs, these nodes have advanced networking capabilities to allow for faster transfers to and from the HPC. See our File Transfers documentation for details. Like the login nodes, these are meant for managing files, not running or compiling code.
Proper Usage of Compute Resources¶
-
Limits: Be aware of and adhere to resource limits.
-
No For-Loops for Job Submission: Do not use for-loops to submit multiple jobs. This can impact the performance of the job scheduler and in extreme cases can cause it to crash, which will temporarily disrupt all HPC users and create additional work for the support team. If you would like to submit a large number of similar jobs, use job arrays.
-
Reliquish Unused Resources: When using Open OnDemand, jobs do not automatically terminate when you close the window. Please terminate sessions when your work is complete by clicking the red Delete button from the My Interactive Sessions page. Resources assigned to inactive sessions prevent other users from accessing them, and causes the whole system to be slower for everyone.
-
Make Appropriate Resource Requests: When writing a batch script or filling out the resource request form in Open OnDemand, please do your best to estimate the number of CPUs and memory necessary to complete your work. Check on the efficiency metrics of your completed jobs to help inform this estimate. Do not request significantly more resources than necessary to run your jobs, as this leads to inefficiencies and makes the system slower for everyone.
Please review our guidelines on resource optimization to ensure that your jobs are being run efficiently.
Proper Usage of Storage Resources¶
-
Disk Usage: Your
/home
directory has a limit of 50GB. Filling up your home directory can cause difficulties accessing the HPC. The/groups
folder allocates 500GB to PIs and their group members. Temporary storage is available in/xdisk
by request. -
File Limits: There are limits on the total number of files in a directory. Up to approximately one hundred thousand files is likely to be okay. Millions of files in one parent directory can slow down the filesystem and your jobs. Split files up across multiple parent folders if necessary.
-
Respect Shared Spaces: If you are using a shared storage space like
/groups
or/xdisk
, be mindful of your usage. If a shared folder is filled to capacity, no other members will be able to save data until it goes back under. Discuss best practices that fit your workflow with your PI and group members. -
Understand File Permissions: There are several key differences between how files are shared on HPC and how files are shared on a personal computer or on the cloud (Google Drive, Dropbox, etc). If you intend to share files with other HPC users, please review our guidelines on Linux File Permissions.
-
Active Data Only: HPC storage is intended for active research data only. If data is not for research purposes, or if it is not for an active project, then it should be moved to a different platform. Options include HPC Rental Storage, Research Desktop-Attached Storage (RDAS), AWS Tier 2 Storage, or other third-party storage providers.
Proper Usage of Software Environments¶
-
Dotfiles: Be careful when modifying your dotfiles. More details on dotfiles here. Please ensure that you are aware of the exact changes that will take place when modifying or removing dotfiles or folders. This may take some research if you are unfamiliar with them.
-
Conda: Anaconda is known to cause issues on HPC systems. We recommend using mamba instead.
-
Environment Variables: These variables change the behavior of your environment, typically by changing the default paths where the interpreter will look for certain (often critical) items. Only modify your environment variables if you clearly understand the precise effect that will take place. Be extra careful when modifying environment variables in your
~/.bashrc
. -
Default Folders: Be careful and intentional when installing packages in software environments. Often times, downloading items to the default locations, such as
~/.local
, can cause unintended side effects. It is often better to put custom software in custom folders with clear names and purposes.
If you have any questions about the standard practices listed above, please reach out to our support team, and we would be happy to provide additional clarification!