2023-08-20 12:00:00
2023-08-03 12:00:00
2023-07-18 08:00:00
Working in an HPC environment, the “dependency hell” is the most bothersome nuisance. (Yep, I personally regard it “the most”, not “one of the most”.)
For instance, in R, users often receive messages such as:
"Error: package or namespace load failed for XXX:package YYY was installed before R 4.0.0: please re-install it"
The solution could be relatively easy, you may upgrade to the latest version of R or trying re-installing the YYY package.
In Linux, however, things could be much more painful.
Especially when working in an HPC, it is nearly impossible to install/update pacakges that requires sudo privilege.
Luckily, my ex-boss, Dr. Chang, introduced me the concept of app containerization using tools such as Docker and Singularity. I tried, and realized that I cannot live without them. They are truly my life savers.
In this post, I would like to provide a step by step tutorial on how to run singularity containers in an HPC environment.
To begin with, a disclamer: I’m not a CS person. That means I won’t be using terminologies. Instead I’ll use just plain words.
Think of this scenario: you got a message when running a software X on your HPC:
The software X is update, however, the software Y is not. Since software X depends on some functions wrapped in Y, so you are not able to run X now.
– bang, ERROR.
You might think, ok, let me update Y. You tried, and got another message:
You are not allowed to update Y since you are not the super user of this computer.
– bang, ERROR again.
This is the simplest example of dependencies hell. In reality, it could be much more complex since the software dependency manner is not one-to-one; sometimes you may need to install more softwares that requires “sudo” privilege.
One way of solving this is to build your own Utopia, where you take fully control:
Another advantage of this Utopia is that you can reproduce your analyses – the package version are controlled, so just run your scripts and it’ll always give you the same results – nothing’s gonna change.
This kind of Utopia world is the so called “container”.
As a bioinfo hobbyist and a singularity lover, I’ll be using the bioinformatics related tools to illustrate how to build and use singularity containers. As for the other containerization tools, you can simply check their documentations.
There are two ways of acquiring a container, downloading one from the public, or build one from scratch:
Public resources are the easiest way to get a container. Repositories such as Docker Hub and Singularity Hub are the most popular ones. You can search for pre-built containers, such as conda, jupyter notebook, Rstudio, etc. Thesea containers are open source, so you can download them and use them for free.
If you cannot find a container that suits your need, you can build one from scratch. This is a bit more complicated, but it’s not that hard. In this post, I’ll be using the Singularity to illustrate how to build a container.
Basically, what you need is simply a “recipe”. A recipe is a file that contains instructions on how to build a container. Here’s an example of a recipe:
Bootstrap: docker
From: debian:stable-slim
%labels
Version v0.0.1
%help
Please contact Jiayi for help.
%post
## basic stuffs
apt-get update
apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
locales
echo 'en_US.UTF-8 UTF-8' >> /etc/locale.gen && \
/usr/sbin/locale-gen en_US.UTF-8 && \
/usr/sbin/update-locale LANG=en_US.UTF-8
## misc stuffs
apt-get update && apt-get install -y --no-install-recommends \
file \
less \
openssl \
libssl-dev \
curl \
coreutils \
gdebi-core
## devel tools
apt-get update && apt-get install -y --no-install-recommends \
g++ \
gfortran \
make \
cmake
## dependencies
apt-get update && apt-get install -y --no-install-recommends \
gettext \
pkg-config \
autoconf-archive \
autoconf \
automake \
autopoint \
txt2man \
build-essential \
autoconf-archive \
automake \
autopoint \
pkg-config \
txt2man
## axel
apt-get -y install aria2
After you’ve done with the recipe, you can paste it into the Singularity Container Builder to generate your own container.
In the following code blocks in this section, I’ll add my comments/recommendations starting with ## Jiayi:
.
Here, I’ll be using one of my Singularity containers as an example. I’ve built several singularity containers that could boost productivity in bioinformatics research. You can check them from here.
First, in your HPC, you’ll need to have the Singularity installed and loaded. If you don’t have it, you can follow the instructions here. If you have it, you can simply load it:
module load singularity/3.8.3
## Jiayi: check with your HPC admin for the version, or check with `module avail singularity`.
Second, you’ll need to pull the container from the cloud. This above command will pull the container from the cloud and save it as a file named “rserver_4-1-1bioinfo.sif”.
singularity pull --arch amd64 library://jiayiliujiayi/rserver/rserver:4-1-1bioinfo
## Jiayi: This container is a Rstudio server container, along with common R packages including Seurat, ggplot2, tidyverse, dplyr, etc.
Third, you can use this container in your HPC. I’ll use SLURM environment as an example. In your workign directory, you can create a script named “run_rstudio.sh” with the following content:
#!/bin/bash
## Jiayi: This is a SLURM script. You can check with your HPC admin for the details (marked with XXX). You can also check the documentations of SLURM: https://slurm.schedmd.com/
#SBATCH --job-name=rstudio
#SBATCH --output=rstudio_%j.out
#SBATCH --error=rstudio_%j.err
#SBATCH --time=3-00:00:00 ## Jiayi: check with your HPC admin for the time limit, or change it to the time length you need.
#SBATCH --mem=100G
#SBATCH --cpus-per-task=10
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=XXX ## Jiayi: change it to your email address if you want to get updated with the job status.
# load the singularity module
module purge
module load singularity/3.8.3
# prepare for the Rstudio server
## Do not suspend idle sessions.
export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0
## Container user set ups
export SINGULARITYENV_USER=XXX ## Jiayi: change it to your username on the HPC
export SINGULARITYENV_PASSWORD=XXXX ## Jiayi: define your own password, and it doesn't necessarily need to be the same as your HPC password.
## Get IP address
IP_ADDR=`hostname -i`
echo $IP_ADDR
## write out the IP address and port number to a file
cat 1>&2 <<END
1. SSH tunnel from your workstation using the following command:
ssh -N -L 8787:${IP_ADDR}:8787 ${SINGULARITYENV_USER}@XXX ## Jiayi: change it to your HPC IP address.
and point your web browser to http://localhost:8787
2. log in to RStudio Server using the following credentials:
user: ${SINGULARITYENV_USER}
password: ${SINGULARITYENV_PASSWORD}
When done using RStudio Server, terminate the job by:
1. Exit the RStudio Session ("power" button in the top right corner of the RStudio window)
2. Issue the following command on the login node:
scancel -f ${SLURM_JOB_ID}
END
# start the Rstudio server
singularity exec --cleanenv \
--bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf, $PWD:/XXX/,/tmp \ ## Jiayi: change it to your working directory, and the directory where you want to save the Rstudio session.
~/rserver_4.1.1bioinfo.sif \
--server-user ${USER} \
--auth-none=0 \
--auth-pam-helper-path=pam-helper
printf 'rserver exited' 1>&2
Fourth, you can submit the job to the HPC using the following command:
sbatch run_rstudio.sh
In your working directory, you’ll see two files: “rstudio_XXXX.out” and “rstudio_XXXX.err”. The former one is the output of the job, and the latter one is the error log. You can check the error log if you encounter any problems. If you don’t see any error, you can check the output file. In the output file, you’ll see a line as such:
ssh -N -L 8787:${IP_ADDR}:8787 ${SINGULARITYENV_USER}@XXX
Copy the above command and paste it into your local terminal.
Finally, you can open your web browser and type in “localhost:8787” in the address bar. You’ll see the Rstudio server page. You can log in with your username and password. And Voila. You are in the Rstudio server environment of your Utopia.
Also, you may wanna take a look at the other information in the output file regarding how to terminate the SLURM job.
2022-01-25 08:00:00
对我来说,这一年但关键词是「变」:身份从打工人转变成学生,地理位置从德州转到新泽西,工作/学习内容从分析慢慢转成方法,做事风格从急匆匆转变为更加仔细…还有前所未有的、「变化比计划快」的时刻,比如年初计划回国但由于隔离和机票价格而不得不放弃,三月时满心以为自己能去weill cornell但被拒绝,五月计划去南美更新签证但突然被取消预约,六月计划WFH但由于签证原因不得不提前两个月离职…
上半年基本忙于申请和面试。疫情原因所有面试都变成了在线,让人感觉挺不真实——很难知道屏幕里坐着的到底是真人还是机器人,更有甚者一边面试我一边吃cereal,观感相当不好。但咱也不好说啥,毕竟对方是能决定我去留的人。工作上,由于精力有限,再加上对于手头的项目看不到未来,所以有些消极怠工。幸而年中就离职了,否则要继续原地兜圈子,挺折磨人的。然而还是很感谢老板给了我这份工作,否则申请学校应该会更加困难。
离职后,由于签证的原因去伦敦待了一个半月。这是近五年来少有的清闲的时光——在刘大爷的院子里摘樱桃,做樱桃酱,晒太阳看书,烧饭煲汤,慢慢游玩伦敦的景点…此处得重点感激我刘大爷,提供了食宿出行游玩各方面的无私帮助,以及安抚我等签证的焦躁心情。总之,在伦敦的每一刻我都很享受,除了要等签证check结束。
后来回到美国收到了海关的小黑屋礼遇,被盘问了快一个小时,内容不便赘述;所幸是被放入境了。
九月,开始了学生生活。课程基本上是把基础医学各科内容都过了一遍,感叹这辈子不知道要被三理一化给支配多少次。实验室轮转方面,第一个实验室是做算法的,第二个做分析。做算法的很吸引人,比起做分析更让我感觉到科研的「单纯」;我很想继续做出点东西,因此也开始自学统计课程和CS。在第二个实验室,实验和分析流程我都很熟悉,很快就上手了,也有一些产出,所以更大程度上是为了滑水,或者说有更多时间学那些前面说到的东西。第二段轮转也更坚定了我以后不走干+湿实验这条路的决心。
去年读了一些本书,和前年一样也记录在了notion里。今年希望能再接再厉。
说到notion,去年一整年我都在和效率软件/方法进行斗争。前前后后试用了七八款宣称能提高效率/生产力的工具,但都不合意,至少对我来说没啥大改善。这些软件/工具设计得都挺fancy,但大多功能繁杂没有重点,以至于我后来还是回归了用纸笔记录——用了多年的方法还是最顺手的。
另一个让我感到抓狂的是在互联网上接受的信息信噪比太低了。以前为了解决这个问题,我设置了rss订阅,最近几个月我越来越觉得很多订阅源没什么营养;还有放在instapaper里的文章,好长时间都想不起来要读。于是我开始怀疑这些内容/输出到底值不值得我去阅读,或者说不阅读它们会对我的生活产生什么影响。我的直觉是很多时候我都对于自己不知道的东西感到太过焦虑/在意了,尽管它们和我毫不相干。不过这一点还是需要再进行确认的,进而寻找对于我来说「重要的内容」。
感觉上面说的这些都和「掌控感」有关——掌控对于时间的安排和信息的接收;以及在学业上,从「分析」到「方法」的转变,是希望从用工具的人转变成开发工具的人,进而掌控背后的算法和逻辑。
健康方面,补了5科牙,做了一颗根管,继续吃舍曲林。我的牙医对于我有这么多蛀牙百思不得其解,我也不知道为啥,毕竟牙线漱口水电动牙刷都用上了。哎,都是命。
生活方面,厨艺有进步,敢于尝试很多需要爆炒的菜,这也是得益于我们现在租住的所谓luxury apt。想起曾经在LA租住车库,在Houston租的studio还是六边形的,在看现在的公寓又有明火又有抽油烟机,用句老套的话说幸福感那叫一个油然而生,此处要感谢我男朋友付了六成的房租,以及对我每道鬼畜的创新菜都能报以热烈的彩虹屁。
回望21年,我俩依然是拿着卖白菜的钱操着卖白粉的心,而且在未来的几年内这个情况还会持续。但日子总要向前过,重要的是珍惜眼前人,过好每一天吧。22年没啥大愿望,只希望家人朋友都能健康平安,以及疫情能快点结束就好了,很想去柳州吃正宗螺蛳粉。