Speeding up Filters & Rendering with Client Server configurations
For Both of these configurations it is recommended that you start up a remote desktop and launch paraview according to the instructions on the previous documented page Getting Started With ParaView. Using the Embedded graphics library paraview binary
Steps To launch a EGL server¶
This workflow uses offscreen rendering and a gpu to speed up interactive workflows.
First request an allocation on Ocelote that has 1 gpu Please remember
to replace the <your account>
with your own
account name
salloc -A <your account> -p standard --gres=gpu:1 -t 3:00:00 -N 1 -n 16
The following lines will download a version of paraview that we can use for offscreen headless rendering. The command will actually rename the export file in order to make it easier to uncompress it.
wget "https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.11&type=binary&os=Linux&downloadFile=ParaView-5.11.1-egl-MPI-Linux-Python3.9-x86_64.tar.gz" -O paraview_egl.tar.gz
Here we uncompress the tar file
tar xf paraview_egl.tar.gz
The folder we get from this will probably include the full paraview
version name so we will use a * pattern match character to go into the
bin folder where the pvserver
program we need to use resides
cd ParaView-5.11.1*/bin/
Here's where the exciting part occurs. We will use the pvserver
program to start up a process which listens for connections from the
graphical interface and fulfills requests on its behalf. The extra
benefit is that it will use GPU hardware accelerated rendering to get
back results to the viewport even faster. We will also be making use of
the embedded graphics library EGL, to perform these renderings without
the need for an x-server display. The flag --displays
is used to
specify the gpu card index that we want to make use of, starting from
the number 0. So if we had another card we wished to use instead we
would use --displays=1
. The potential to leverage multiple gpus is
available but requires a different compiled version of paraview with a
specific nvidia license so it will not be covered here.
./pvserver --displays=0
For completeness here are the commands all together for copying in pasting into a terminal on the HPC
wget "https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.11&type=binary&os=Linux&downloadFile=ParaView-5.11.1-egl-MPI-Linux-Python3.9-x86_64.tar.gz" -O paraview_egl.tar.gz
cd ParaView-5.11.1*/bin/
tar xf paraview_egl.tar.gz
./pvserver --displays=0
Take note of the address that is written to the screen at this point it
will look something like
cs://<node_name>.<cluster_name>.arizona.edu:11111
and this will be
used when we connect via the gui client Steps to launch a client to
connect to the server
For a first draft of this page we will rely on the documentation provided on the paraview read the docs page to understand the process of connecting from the graphical user interface. https://docs.paraview.org/en/latest/ReferenceManual/parallelDataVisualization.html#configuring-a-server-connection
Once the connection has been made, it is possible to determine the
extent to which the gpu is utilized by putting the running ./pvserver
in the background with ctrl-z
and then typing bg
. At this point you
can now type nvidia-smi -l 2
to get a log of the activity on the gpu
in a text table.
Using multinode MPI and osmesa for¶
This process is very similar to the above, but we will be asking for a
slightly different allocation. Use the following command on Elgato for
testing having two full nodes of 16 cpu cores to speed up filtering and
rendering with paraview. After this tutorial feel free to test out using
different machines or node and core counts. The rule being that the
total number of cpu cores is specified in the -n
flag and the total
number of nodes is specified with the -N
flag. For more information
consult the basic Slurm scheduling pages. Please remember to replace
the with your own account name
interactive -a <your account> -N 2 -n 32 -t 1:00:00
When the first of the multiple nodes is allocated go and download the latest paraview osmesa.
wget "https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.11&type=binary&os=Linux&downloadFile=ParaView-5.11.1-osmesa-MPI-Linux-Python3.9-x86_64.tar.gz" -O paraview_mesa.tar.gz
The next step is to untar it
tar xf paraview_mesa.tar.gz
Then at this point you are ready to navigate into the bin folder
cd Paraview*/bin
Now use the provided mpiexec binary to start up a
pvserver
backed up by all our allocated cores
./mpiexec -n 32 ./pvserver
Once again this is an offscreen rendering system able to run without
needing to have displays or anything. A side benefit is since this is an
entire osmesa build, no missing opengl dependencies cause problems.
Again we need to pay attention to the output of this command so that we
can see the server's connection url to use. For instance if the mpi
nodes are gpu1&gpu2
it will look like
cs://gpu2.elgato.hpc.arizona.edu:11111
At this point we are now all set in order to start up the paraview gui in the remote desktop allocation. Again, please refer back to the Getting Started with Paraview documentation for instructions on this TODO replace this with the link in confluence.
Once the gui is open, configure a new connection, and put in just the
server address ie gpu2.elgato.hpc.arizona.edu
for the host, then
supply the port as 11111
After about 10-30 seconds you will be connected and all of the filters and rendering will be accelerated by parallel multinode-mpi.
To double check whether your cores are being fully utilised consider using the following. First open a separate terminal ssh into each of the mpi nodes and run htop to see their individual cpu core utilization. As you interact with the paraview gui client the multinode-mpi server will undergo periods of low and high activity observable in the htop display.
Please consult the video below for any additional questions or reach out to the vislab-consult@list.arizona.edu
**Note that this video is using less mpi resources because waiting on two full elgato nodes wasn't allocating very quickly and made it harder to record the video. Just remember instead of -n 8 you can use -n 32 and then go out for a quick snack.**