Installing environment modules on Xubuntu

Page content
In addition to Conda, [environment modules](https://en.wikipedia.org/wiki/Environment_Modules_(software)) provide users with a convenient approach to switching software environments on Linux machines. This approach is widely used on computer clusters that offer computational services to a large number of users, and the environment modules are shared by authorised users. These modules, however, are not Linux kernel modules, which are automatically launched by the OS at start-up, and they should be manually loaded to the OS by users. I learnt how to use module commands for bioinformatic analysis when I was studying at the University of Melbourne. Loading a module essentially modifies your environmental variable `$PATH`. In this post, I set up a module manager for users of my Xubuntu system.



1. Installation

1.1. Compiling and installing software Modules

I downloaded source code of Environment Modules 4.3.1 from SourceForge, and followed online instructions1, 2 to built the program using GCC 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1).

# Install TCL, a prerequisite of Modules
sudo apt-get install tcl-dev tk-dev  # TCL 8.6 was installed.

# Build and install Modules
tar -xzf modules-4.3.1.tar.gz  # Extract source code
cd ~/Downloads/modules-4.3.1
./configure
make  # Do not need to run "make -C doc all"
sudo make install  # Installed to /usr/local/Modules/
make distclean  # Make a thorough deletion of compiled files

At this stage, the command module is invalid. We need to initiate Modules every time of entering the terminal. For initiation:

source /usr/local/Modules/init/bash

module avail  # Now this command works
----------------------------- /usr/local/Modules/modulefiles ---------------------------
dot  module-git  module-info  modules  null  use.own

1.2. Directory structure and modulefiles of the installation

Components of Environment Modules are installed under the directory specified by the prefix parameter of the script configure, as we can show as the follows:

./configure --help
Depending on the above configuration options the files are approximately
placed in the following directory structure:
  PREFIX/
    bin/
    etc/
    init/
    lib/
    libexec/
    share/
      doc/
      man/
        man1/
        man4/
      vim/
        vimfiles/
    modulefiles/

We can inspect sub-directories of the Modules directory and a list of modulefiles, which are TCL scripts and objects loaded, unloaded, or switched by users using the command module.

ls -l /usr/local/Modules/  # Show sub-directories
drwxr-xr-x 2 root root 4096 Oct 28 06:45 bin
drwxr-xr-x 2 root root 4096 Oct 28 06:45 etc
drwxr-xr-x 4 root root 4096 Oct 28 06:56 init
drwxr-xr-x 2 root root 4096 Oct 28 06:45 lib
drwxr-xr-x 2 root root 4096 Oct 28 06:45 libexec
drwxr-xr-x 2 root root 4096 Oct 28 06:45 modulefiles
drwxr-xr-x 5 root root 4096 Oct 28 06:45 share

ls -l /usr/local/Modules/modulefiles/  # A list of modulefiles
-rw-r--r-- 1 root root  346 Oct 28 06:56 dot
-rw-r--r-- 1 root root  558 Oct 28 06:56 module-git
-rw-r--r-- 1 root root 2272 Oct 28 06:56 module-info
-rw-r--r-- 1 root root  751 Oct 28 06:56 modules
-rw-r--r-- 1 root root  340 Oct 28 06:56 null
-rw-r--r-- 1 root root 1398 Oct 28 06:56 use.own

Obviously, names of the modulefiles are the same as the output of command module avail. Actually, new modules are added to the Modules system via creating new modulefiles in the directory modulefiles.

A remark on the installation directory

The default installation directory of Environment Modules is /usr/local/Modules/. Despite that, in theory, we can install the module system to a non-default location using the parameter prefix offered by the script configure (for instance, sudo mkdir /mnt/program/modules; ./configure --prefix=/mnt/program/modules; make; make install), the system will not work without a further manual adjustment, because the default location is assumed by initiation scripts of the module system (hence the command source /mnt/program/modules/init/bash fails under this circumstance). In short, it is preferable to use the default installation directory, unless we have a compelling reason.

1.3. Automatic initiation of Modules for users

Environment Modules must be initiated in shell so that the command module avail becomes valid. This process is carried out by the script /usr/local/Modules/init/profile.sh or profile.csh (depending on which shell you are using). It is obviously preferable to carry out the initiation automatically at system’s start-up. Here, I demonstrate a global way to initiate Modules in every terminal session for users1. Note that since the default permission status of /usr/share/modules/init/sh is 644 (rw-r–r–), users cannot directly run it. Instead, the command source is used.

Step 1: link the initiation scripts under the directory accessed at the shell’s start-up

sudo ln -s /usr/local/Modules/init/profile.sh /etc/profile.d/modules.sh
sudo ln -s /usr/local/Modules/init/profile.csh /etc/profile.d/modules.csh

Step 2: modify the system file /etc/bash.bashrc to apply the initiation to every user

echo -e "\n# For initiating Modules" | sudo tee -a /etc/bash.bashrc > /dev/null  # Append a line to the end of this file with no return message.
echo ". /etc/profile.d/modules.sh" | sudo tee -a /etc/bash.bashrc > /dev/null

Step 3: restart the terminal. The Modules system is then enabled (namely, the command module avail returns a list of installed modules).

What profile.sh does?

This script sources (runs) the module initiation script /usr/local/Modules/init/bash:

less /usr/local/Modules/init/profile.sh  # /etc/profile.d/modules.sh

# get current shell name by querying shell variables or looking at parent
# process name
if [ -n "${BASH:-}" ]; then
   shell=${BASH##*/}
elif [ -n "${ZSH_NAME:-}" ]; then
   shell=$ZSH_NAME
else
   shell=$(/usr/bin/basename $(/bin/ps -p $$ -ocomm=))
fi

if [ -f /usr/local/Modules/init/$shell ]; then  # On Xubuntu, it returns "bash".
   . /usr/local/Modules/init/$shell  # The same as command "source /usr/local/Modules/init/$shell"
else
   . /usr/local/Modules/init/sh
fi

Therefore, we can manually execute the command . /etc/profile.d/modules.sh to initiate the module system.

Notes

  • We do not need to use the command echo -e "/usr/local/Modules/modulefiles\n" because a newline character will be attached to the output string automatically.
  • The command which module always returns nothing.
  • I could have the program found and installed using apt-get or Ubuntu’s Software & Updates GUI, although the following commands are said in Reference 3.
  sudo apt-get update
  sudo apt-get install environment-modules
  • The solution given below in Reference 2 for initiation does not work, because it cannot solve the problem “Error found when loading /etc/profile: /etc/profile.d/modules.sh: line 6: /user/share/modules/init/sh: No such file or directory”
sudo nano /etc/profile  # Add "source /usr/local/Modules/init/bash" (which executes the file .../bash in the CURRENT shell) to the end of it, and logout and login again.
  • An alternative but not recommended way to automatically initiate Modules is appending the aforementioned command to the .bashrc in every user’s home directory:
echo "# For initiating Modules" >> ~/.bashrc
echo ". /etc/profile.d/modules.sh" >> ~/.bashrc



2. Maintenance

2.1. Installing modules

In this section, I demonstrate the installation of MPIHC and MrBayes as two modules.

2.1.1. MPIHC for parallel computing

Online installation instructions & releases

cd ~/Downloads
wget http://www.mpich.org/static/downloads/3.3.1/mpich-3.3.1.tar.gz
tar -xzf mpich-3.3.1.tar.gz  # MPICH 3.3.1 will be installed as a module for users.

# First, compile and install MPICH as normal software
cd mpich-3.3.1
./configure --disable-fortran  # Do not build the MPI Fortran library.
make  # Both commands configure and make may take a while to finish, respectively.
sudo make install
mpiexec --version  # Check whether the software is successfully installed.

# Next, create a modulefile for mpiexec (cf. Reference 4 for other examples)
which mpiexec
/usr/local/bin/mpiexec  # Installation path

sudo mkdir /usr/local/Modules/modulefiles/mpi
sudo nano /usr/local/Modules/modulefiles/mpi/mpihc-3.3.1  # Add commands (see below)

# (Optional) Specify the default version of MPI modules
sudo nano /usr/local/Modules/modulefiles/mpi/.version  # Add a command (see below)

Content of the modulefile mpihc-3.3.1:

#%Module1.0 ########################################

# A brief introduction of this module
proc ModulesHelp { } {
    global dotversion
    puts stderr "\tMPIHC-3.3.1"
}

# Define a new module
module-whatis "MPIHC-3.3.1"
prepend-path PATH /usr/local/bin/  # Where MPI programs can be found

Content of the version file .version:

#%Module1.0 ########################################
set ModulesVersion "mpihc-3.3.1"  # The same as the modulefile name

Now the connection to MPIHC is completed, and the new module appears in the module list (do not need to restart the terminal):

module avail
------------------------- /usr/local/Modules/modulefiles ---------------------------
dot  module-git  module-info  modules  mpi/mpihc-3.3.1(default)  null  use.own

The installed MPIHC package includes the following compilers:

  • mpicc: C compiler at /usr/local/bin/mpicc;
  • mpicxx: C++ compiler at /usr/local/bin/mpicxx, with a symbolic link /usr/local/bin/mpic++.

In summary:

  • TCL commands in modulefiles show that, in essence, loading a module is adding paths of required components to $PATH as well as removing paths of conflicting components from $PATH.
  • Modules are only loaded to the current session. Consequently, all loaded modules are unloaded once the current terminal is closed.
  • For every module, the relative path under the modulefiles directory becomes the module name in the list printed by command module avail. For instance, “mpi/mpihc-3.3.1” in the path “/usr/local/Modules/modulefiles/mpi/mpihc-3.3.1”, and “module-info” in the path “/usr/local/Modules/modulefiles/module-info” become two module names.

2.1.2. Parallel version of MrBayes for Bayesian phylogenetic reconstruction

Ideally, the Linux version of MrBayes uses the Beagle library and MPI for high speed. Accordingly, in this section, I create a MrBayes module using these two techniques. Notably, it is quite trick to build a parallel version of MrBayes on Ubuntu. See this online guidance for details.

Installing the Beagle library

# Install prerequisites
sudo apt-get install build-essential autoconf automake libtool git pkg-config openjdk-8-jdk  # No openjdk-9-jdk is available by 29 Oct 2019. OpenJDK: /usr/lib/jvm/java-8-openjdk-amd64/bin/

# Build the Beagle library 3.1.2 and install it at a non-default location
wget https://github.com/beagle-dev/beagle-lib/archive/v3.1.2.tar.gz
mv v3.1.2.tar.gz Beagle_3.1.2.tar.gz
tar -xzf Beagle_3.1.2.tar.gz
cd beagle-lib-3.1.2/
./autogen.sh
sudo mkdir /usr/local/lib/beagle
./configure LDFLAGS=-Wl,-rpath=/usr/local/lib/beagle -prefix=/usr/local/lib/beagle  # As per Installation guide 2; however, I did not use the option --without-jdk. The -prefix option is needed despite the presence of -rpath.
make -j4  # Compile code using 4 parallele jobs (machine dependent)
sudo make install  # Library location: /usr/local/lib/beagle/lib  # Do not rename the lib directory.

# Create a modulefile
sudo mkdir /usr/local/Modules/modulefiles/beagle
sudo nano /usr/local/Modules/modulefiles/beagle/beagle-lib-3.1.2

Content of the modulefile is shown below. Note the importance of setting the environment variable LD_LIBRARY_PATH before running MrBayes (see the online instructions). Therefore, making Beagle a module automatically satisfies this requirement.

#%Module1.0 ########################################
proc ModulesHelp { } {
    global dotversion
    puts stderr "\tbeagle-lib-3.1.2"
}
module-whatis "Beagle-lib-3.1.2"

# Set environment variables
prepend-path PATH /usr/local/lib/beagle/lib
prepend-path LD_LIBRARY_PATH /usr/local/lib/beagle/lib
prepend-path PKG_CONFIG_PATH /usr/local/lib/beagle/lib/pkgconfig

Check the configuration:

module avail
-------------------------------- /usr/local/Modules/modulefiles ------------------------
beagle/beagle-lib-3.1.2  module-git   modules                   null     
dot                      module-info  mpi/mpihc-3.3.1(default)  use.own  

# Load the module
module load beagle/beagle-lib-3.1.2

echo $PATH
/usr/local/lib/beagle/lib:/usr/local/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

echo $LD_LIBRARY_PATH
/usr/local/lib/beagle/lib

echo $PKG_CONFIG_PATH
/usr/local/lib/beagle/lib/pkgconfig

# Unload the module
module unload beagle/beagle-lib-3.1.2

echo $PATH  # Returns to default
/usr/local/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

# Either echo $LD_LIBRARY_PATH or echo $PKG_CONFIG_PATH returns nothing.

So loading and unloading a module is essentially setting environment variables.

Compiling MrBayes

To build a MrBayes program that utilises MPI parallelisation and the Beagle library:

# Download source code from GitHub
git clone --depth=1 https://github.com/NBISweden/MrBayes.git  # A shallow clone
cd MrBayes/

# Check dependencies and install the missing ones
sudo apt-get install pkg-config automake autoconf libreadline-dev libtool make autoconf-archive  # autoconf-archive, pkg-config 0.29.1, automake 1.15.1, and autoconf 2.69

# Build MrBayes on Ubantu
# See A. K. Kähäri's response at https://github.com/NBISweden/MrBayes/issues/136
# BEAGLE_CFLAGS points to the directory including a sub-directory libhmsbeagle, which contains beagle.h and platform.h.
# This variable must not be "-I/usr/local/lib/beagle/lib/pkgconfig/hmsbeagle-1", or Issue 2 in Section Troubleshooting arises.
env MPICC="/usr/local/bin/mpicc" PKG_CONFIG_PATH="/usr/local/lib/beagle/lib/pkgconfig" BEAGLE_CFLAGS="-I/usr/local/lib/beagle/include/libhmsbeagle-1" BEAGLE_LIBS="-L/usr/local/lib/beagle/lib -Wl,-rpath=/usr/local/lib/beagle/lib -lhmsbeagle" ./configure --with-mpi

make -j4  # Compile code
sudo make install  # See Installation guidance 1 for destination directories
which mb
/usr/local/bin/mb

make clean

Done.

Creating a modulefile for MrBayes

sudo mkdir /usr/local/Modules/modulefiles/mrbayes
sudo nano /usr/local/Modules/modulefiles/mrbayes/3.2.7a

Content of the modulefile is shown as follows. Note that since mb is located in the default software directory /usr/local/bin, we do not need to prepend-path PATH /usr/local/bin. Nonetheless, we can leverage the command module load to set up environment variables required by MrBayes.

#%Module1.0 ########################################
proc ModulesHelp { } {
    global dotversion
    puts stderr "\tMrBayes-3.2.7a"
}
module-whatis("MrBayes-3.2.7a")

# Set environment variables for MrBayes
module load beagle/beagle-lib-3.1.2

Check the configuration:

module avail
---------------------------- /usr/local/Modules/modulefiles ----------------------------
beagle/beagle-lib-3.1.2  module-git   modules                   mrbayes/3.2.7a  use.own
dot                      module-info  mpi/mpihc-3.3.1(default)  null

module load mrbayes/3.2.7a 
Loading mrbayes/3.2.7a
  Loading requirement: beagle/beagle-lib-3.1.2

module list
Currently Loaded Modulefiles:
 1) beagle/beagle-lib-3.1.2   2) mrbayes/3.2.7a
 
echo $LD_LIBRARY_PATH
/usr/local/lib/beagle/lib

echo $PKG_CONFIG_PATH
/usr/local/lib/beagle/lib/pkgconfig

mb -v
MrBayes, Bayesian Analysis of Phylogeny

Version:   3.2.7a
Features:  SSE AVX Beagle MPI
Host type: x86_64-unknown-linux-gnu (CPU: x86_64)
Compiler:  gnu 7.4.0

module unload mrbayes/3.2.7a  # Its dependency will be unloaded as well. In addition, the aforementioned two environment variables are also deleted.
Unloading mrbayes/3.2.7a
  Unloading useless requirement: beagle/beagle-lib-3.1.2

The module MrBayes shows correct information and seems to work as expected.

Troubleshooting

Issue 1: incorrect path specification of the Beagle library for ./configure results in errors at the linking stage:

/usr/bin/ld: cannot find -lhmsbeagle
collect2: error: ld returned 1 exit status
Makefile:402: recipe for target 'mb' failed
make[2]: *** [mb] Error 1
make[2]: Leaving directory '/home/yu/Downloads/MrBayes/src'
Makefile:310: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/home/yu/Downloads/MrBayes/src'
Makefile:418: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

Issue 2: header files libhmsbeagle/beagle.h and libhmsbeagle/platform.h cannot be found when using the following commands to build MrBayes.

env MPICC="/usr/local/bin/mpicc" PKG_CONFIG_PATH="/usr/local/lib/beagle/lib/pkgconfig" ./configure --with-mpi --with-beagle="/usr/local/lib/beagle/lib"  # Adding BEAGLE_CFLAGS="-I/usr/local/lib/beagle/include/libhmsbeagle-1" does not help.

Error messages about libhmsbeagle/beagle.h:

In file included from likelihood.c:38:0:
bayes.h:63:10: fatal error: libhmsbeagle/beagle.h: No such file or directory

 #include "libhmsbeagle/beagle.h"
          ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:479: recipe for target 'mb-likelihood.o' failed
make[2]: *** [mb-likelihood.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from bayes.c:38:0:
bayes.h:63:10: fatal error: libhmsbeagle/beagle.h: No such file or directory
 #include "libhmsbeagle/beagle.h"
          ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from best.c:18:0:
bayes.h:63:10: fatal error: libhmsbeagle/beagle.h: No such file or directory
 #include "libhmsbeagle/beagle.h"
          ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:437: recipe for target 'mb-bayes.o' failed
make[2]: *** [mb-bayes.o] Error 1
Makefile:451: recipe for target 'mb-best.o' failed
make[2]: *** [mb-best.o] Error 1
In file included from command.c:38:0:
bayes.h:63:10: fatal error: libhmsbeagle/beagle.h: No such file or directory
 #include "libhmsbeagle/beagle.h"
          ^~~~~~~~~~~~~~~~~~~~~~~

Error messages about libhmsbeagle/platform.h (when beagle.h is present in MrBayes/src/libhmsbeagle):

In file included from bayes.h:63:0,
                 from bayes.c:38:
libhmsbeagle/beagle.h:111:10: fatal error: libhmsbeagle/platform.h: No such file or directory
 #include "libhmsbeagle/platform.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~
In file included from bayes.h:63:0,
                 from likelihood.c:38:
libhmsbeagle/beagle.h:111:10: fatal error: libhmsbeagle/platform.h: No such file or directory
 #include "libhmsbeagle/platform.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
compilation terminated.
Makefile:437: recipe for target 'mb-bayes.o' failed
make[2]: *** [mb-bayes.o] Error 1
make[2]: *** Waiting for unfinished jobs....
Makefile:479: recipe for target 'mb-likelihood.o' failed
make[2]: *** [mb-likelihood.o] Error 1
In file included from bayes.h:63:0,
                 from best.c:18:
libhmsbeagle/beagle.h:111:10: fatal error: libhmsbeagle/platform.h: No such file or directory
 #include "libhmsbeagle/platform.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:451: recipe for target 'mb-best.o' failed
make[2]: *** [mb-best.o] Error 1
In file included from bayes.h:63:0,
                 from command.c:38:
libhmsbeagle/beagle.h:111:10: fatal error: libhmsbeagle/platform.h: No such file or directory
 #include "libhmsbeagle/platform.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:465: recipe for target 'mb-command.o' failed
make[2]: *** [mb-command.o] Error 1
make[2]: Leaving directory '/home/yu/Downloads/MrBayes/src'
Makefile:310: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/home/yu/Downloads/MrBayes/src'
Makefile:418: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

These header files are installed by Beagle, and in my case, they are located in the system directory /usr/local/lib/beagle/include/libhmsbeagle-1/libhmsbeagle/. Nevertheless, the header file ./src/bayes.h of MrBayes code tries to access beagle.h via the C command #include "libhmsbeagle/beagle.h", and the beagle.h tries to access platform.h via the command #include "libhmsbeagle/platform.h". I solved this problem by creating following symbolic links in the directory src (See GitHub Issue #136) or simply using the command line given in Section Compiling MrBayes.

mkdir ./src/libhmsbeagle
ln -s /usr/local/lib/beagle/include/libhmsbeagle-1/libhmsbeagle/beagle.h ./src/libhmsbeagle/beagle.h
ln -s /usr/local/lib/beagle/include/libhmsbeagle-1/libhmsbeagle/platform.h ./src/libhmsbeagle/platform.h

Then MrBayes can be built successfully.

2.2. Uninstalling or renaming modules

Since Modules manages modules through modifying the user’s environment, it is obvious that deletion of modulefiles leads to removal of corresponding modules. Similarly, we can rename a module by changing the directory or name of a module file.



References

  1. https://modules.readthedocs.io/en/latest/INSTALL.html
  2. https://www.ivofilot.nl/posts/view/23/How+to+install+and+use+the+environment+modules+system.
  3. https://www.howtoinstall.co/en/ubuntu/trusty/environment-modules
  4. https://blog.geoghegan.me/linux/installing-environment-modules
  5. http://www.walkingrandomly.com/?p=5680