Linux
- chmod
- chmod 400 file - Read by owner - chmod 040 file - Read by group - chmod 004 file - Read by world - chmod 200 file - Write by owner - chmod 020 file - Write by group - chmod 002 file - Write by world - chmod 100 file - execute by owner - chmod 010 file - execute by group - chmod 001 file - execute by world - chmod 444 file - Allow read permission to owner and group and world - chmod 777 file - Allow everyone to read, write, and execute file
ARCH Linux Setup
-
Setup WIFI
iwctl
station wlan0 get-networks station wlan0 connect <Network name>
-
Arch Install
- GUI
archinstall
or
-
CLI
Create partition
cfdisk /dev/nvme0n1 800 M for EFI System > 20 GB for Linux filesystem ... for Linux swap
Format
mkfs.fat -F32 /dev/<EFI System> mkfs.ext4 /dev/<Linux filesystem> mkswap /dev/<swap>
Mount
#root mount /dev/<linux filesystem /mnt mkdir /mnt/boot mount /dev/<EFI system> /mnt/boot swapon /dev/<swap>
Install
pacstrap -i /mnt base base-devel linux-zen linux-firmware git sudo neofetch htop intel-ucode nano vim bluez bluez-utils networkmanager
genfstab -U /mnt >> /mnt/etc/fstab cat /mnt/etc/fstab
Enter the system
arch-chroot /mnt # change root password passwd # create user useradd -m -g users -G wheel,storage,power,video,audio -s /bin/bash <username> passwd <username> EDITOR=vim visudo # uncomment line %wheel ALL=(ALL:ALL) ALL
Timezone
ln -sf /usr/share/zoneinfo/... /etc/localtime hwclock --systohc vim /etc/locale.gen #uncomment en_US ... locale-gen vim /etc/locale.conf # add "LANG=en_US.UTF-8
Hostname
vim /etc/hostname # add hostname vim /etc/hosts # add this line: 127.0.0.1 localhost ::1 localhost 127.0.1.1 <hostname>.localdomain <hostname>
Bootloader
pacman -S grub efibootmgr dosfstools mtools grub-install --traget=x86_64-efi --efi-directory=/boot --bootloader-id=GRUB grub-mkconfig -o /boot/grub/grub.cfg
Finish
systemctl enable bluetooth systemctl enable NetworkManager exit umount -lR /mnt
- GUI
Unplug the USB drive and boot to the system
-
Setup Enable radio wifi
nmcli dev status nmcli radio wifi on nmcli dev wifi list sudo nmcli dev wifi connect <name> password "<password>" # update sudo pacman -Syu
Install Desktop GUI
sudo pacman -S xorg sddm plasma-meta plasma-workspace kde-applications sudo systemctl enable sddm sudo systemctl start sddm
-
Fix Backend Fix Discover App
sudo pacman -Sy flatpak
Install Nvidia Driver
lspci | grep -E "NVIDIA" sudo pacman -Sy nvidia
-
Edit boot loader
sudo pacman -Sy os-prober sudo vim /etc/default/grub # change following line # GRUB_TIMEOUT=20 # uncomment GRUB_DISABLE_OS_PROBER=false sudo grub-mkconfig -o /boot/grub/grub.cfg
-
Chinese Character and Keyboard
sudo pacman -S noto-fonts noto-fonts-cjk noto-fonts-extra noto-fonts-emoji ttf-dejavu ttf-liberation sudo pacman -S fcitx5-im fcitx5-rime cd ~/.local/share/fcitx5/rime git clone https://github.com/iDvel/rime-ice.git cp -r ./rime-ice/* .
SSH Configuration
-
RSA
- RSA keys have been the default for many years and are supported by almost all SSH clients and servers. They are well-understood and trusted in various computing environments. Many systems default to RSA key lengths of 2048 or 3072 bits, though some users prefer 4096 bits for enhanced security.
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
- RSA keys have been the default for many years and are supported by almost all SSH clients and servers. They are well-understood and trusted in various computing environments. Many systems default to RSA key lengths of 2048 or 3072 bits, though some users prefer 4096 bits for enhanced security.
-
Ed25519
- Ed25519 is increasingly popular due to its strong security features and efficiency. It uses elliptic curve cryptography to provide excellent security with shorter keys, resulting in faster performance and less data usage during authentication. Many modern systems and security guidelines now recommend Ed25519 as the preferred choice for new key generation.
ssh-keygen -t ed25519 -C "your_email@example.com"
- Ed25519 is increasingly popular due to its strong security features and efficiency. It uses elliptic curve cryptography to provide excellent security with shorter keys, resulting in faster performance and less data usage during authentication. Many modern systems and security guidelines now recommend Ed25519 as the preferred choice for new key generation.
-
ECDSA
- ECDSA is another commonly used type, particularly because it also offers good security with shorter key lengths compared to RSA. It's often used where there's a need for a balance between compatibility and modern cryptographic practices. ECDSA keys using the NIST P-256 curve (nistp256) are particularly common.
ssh-keygen -t ecdsa -b 256 -C "your_email@example.com"
- ECDSA is another commonly used type, particularly because it also offers good security with shorter key lengths compared to RSA. It's often used where there's a need for a balance between compatibility and modern cryptographic practices. ECDSA keys using the NIST P-256 curve (nistp256) are particularly common.
- note: RSA and Ed25519 are generally the most recommended, with Ed25519 often preferred for new deployments due to its robustness and efficiency. RSA remains widely used due to its long history and broad support across older and legacy systems. For new systems or updates, transitioning to Ed25519 from RSA or ECDSA is a common recommendation for enhanced security and performance.
-
Server Config
- copy and paste the public keys to the authorized_keys file on the server.
echo "paste-your-public-key-here" >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys
- copy and paste the public keys to the authorized_keys file on the server.
-
Local Config
- create a config file in .ssh folder
Host "custom name" HostName "hostname -after @" User "username" IdentityFile "private key location"
- after configuration use the following command to connect to the server
ssh "custom name"
- create a config file in .ssh folder
Install Ansible
- Initializing
- Create a folder inventory with a hosts file in it
[server] # group name {ip address} {server name}
- Try to ping the servers with password
ansible -i ./inventory/hosts server -m ping --user sysadmin --ask-pass
- Create a folder playboos and has a yaml file apt.yml in it
- hosts: "*" become: tasks: - name: apt apt: update_cache: yes upgrade: 'yes'
- Run the playbook
ansible-playbook ./playbooks/apt.yml --user serveradmin --ask-pass --ask-become-pass -i ./inventory/hosts
- Create a file qemu-get-agent.yml under playbooks to install a module
- name: install latest qemu-guest-agent hosts: "*" tasks: - name: install qemu-guest-agent apt: name: qemu-guest-agent state: present update_cache: true become: true
- Add the mattermost playbook
--- - name: Install Mattermost Server hosts: all become: yes vars: mattermost_version: 5.31.0 mattermost_db_name: mattermost mattermost_db_user: mmuser mattermost_db_password: mmuser_password tasks: - name: Install necessary packages apt: name: "{{ item }}" state: present with_items: - git - nginx - postgresql - postgresql-contrib - name: Create Mattermost user user: name: mattermost state: present - name: Clone Mattermost server git: repo: 'https://github.com/mattermost/mattermost-server.git' dest: "/opt/mattermost-server" version: "v{{ mattermost_version }}" become: yes become_user: mattermost - name: Configure PostgreSQL block: - name: Create Mattermost database postgresql_db: name: "{{ mattermost_db_name }}" login_user: postgres - name: Create Mattermost database user postgresql_user: db: "{{ mattermost_db_name }}" name: "{{ mattermost_db_user }}" password: "{{ mattermost_db_password }}" priv: ALL login_user: postgres - name: Set up Mattermost configuration template: src: mattermost_config.json.j2 dest: "/opt/mattermost-server/config/config.json" owner: mattermost mode: '0644' - name: Start Mattermost service systemd: name: mattermost state: started enabled: yes
- Create a folder inventory with a hosts file in it
Virtual Machine with Vagrant
-
Download and Install Tools
- Download and install VirtualBox from the Official VirtualBox website.
- Download and install Vagrant from the Official Vagrant website.
-
Get the Linux Box from Vagrant Cloud
- Visit Vagrant Cloud to find a suitable Linux box. Alternatively, you can add a Linux box directly using the command line:
Replacevagrant box add [box_name]
[box_name]
with the name of the Linux box you want to use.
- Visit Vagrant Cloud to find a suitable Linux box. Alternatively, you can add a Linux box directly using the command line:
-
Initialize Vagrant Environment
- Initialize the VM with the following command:
Again, replacevagrant init [box_name]
[box_name]
with the name of your chosen box.
- Initialize the VM with the following command:
-
Start the Virtual Machine
- Start the VM with:
vagrant up
- Start the VM with:
-
Check Installed Linux Box Version
- To check the installed Linux version and other boxes, use:
vagrant box list
- To check the installed Linux version and other boxes, use:
-
Connect to VM
- Connect to your VM via SSH using:
vagrant ssh
- Connect to your VM via SSH using:
-
Disconnect to VM
- suspend the VM
vagrant suspend
- resume from suspend
vagrant resume
- shutdown the VM
vagrant halt
- suspend the VM
Add on features for Linux app
-
NeoVim Setup
Requirements:
- Install Nerd font first
wget https://github.com/ryanoasis/nerd-fonts/releases/download/v3.2.1/Hack.zip unzip Hack.zip mkdir -p ~/.local/share/fonts sudo cp Hack/.ttf ~/.local/share/fonts/ fc-cache -fv
- Install npm
sudo apt install npm
git clone https://github.com/Henryfzh/documentation.git
`gcc` - Toggles the current line using linewise comment `gbc` - Toggles the current line using blockwise comment `[count]gcc` - Toggles the number of line given as a prefix-count using linewise `[count]gbc` - Toggles the number of line given as a prefix-count using blockwise `gc[count]{motion}` - (Op-pending) Toggles the region using linewise comment `gb[count]{motion}` - (Op-pending) Toggles the region using blockwise comment
- Install Nerd font first
-
Theme
Blur the windows:
mutter-rounded mutter-rounded setting
-
mdBook
Requirements:
- Install Rust
cargo install mdbook
- Install Rust
-
tmux
Install TPM:
- Clone:
git clone https://github.com/tmux-plugins/tpm ~/.tmux/plugins/tpm
- Create ~/.tmux.conf, and add following to it:
# List of plugins set -g @plugin 'tmux-plugins/tpm' set -g @plugin 'tmux-plugins/tmux-sensible' set -g @plugin 'catppuccin/tmux' set -g @catppuccin_flavour 'mocha' run '~/.tmux/plugins/tpm/tpm' set -g default-terminal 'tmux-256color'
- Install
ctrl + B, I
- Reload:
tmux source ~/.tmux.conf
- Clone:
-
zsh fuzzy finder
fzf
.zshrc plugins
powerlevel10k
copypath copyfile copybuffer
flatpak zsh on VSCode, add the lines to settings.json on VSCode
"terminal.integrated.defaultProfile.linux": "bash", "terminal.integrated.profiles.linux": { "bash": { "path": "/usr/bin/flatpak-spawn", "overrideName": true, "args": ["--host", "--env=TERM=xterm-256color", "zsh"] } },
Docker Basics
-
Build
sudo docker -t <target-name> -f Dockerfile .
-
Run
- With bash
sudo docker run --rm -it --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined <target-name> /bin/bash
- Without bash
sudo docker run --rm --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined rochpl.6.0 mpirun_rochpl -P 1 -Q 1 -N 45312
- Mount a Directory
sudo docker run --rm -it --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined --network host --name rochpl_node -v <directory>:/opt rochpl /usr/sbin/sshd -D
- With bash
-
Update Docker
- commit the changes
docker ps # to get the container id
docker commit <containerID> <imageid>
- commit the changes
-
Clean
# remove all images sudo docker system prune -a sudo docker container prune sudo docker buildx prune -f
High Performance Linpack
- Install OpenMPI
- Download and unzip
wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.6.tar.gz tar xzf openmpi-5.0.6.tar.gz cd openmpi-5.0.6/
- Compile
Preferably installed at "/usr/local/" but require sudo access./configure --prefix=<OPENMPI_INSTALL_DIRECTORY> make make install
- Download and unzip
- Install OpenBLAS
- Download and unzip
wget https://github.com/OpenMathLib/OpenBLAS/releases/download/v0.3.28/OpenBLAS-0.3.28.tar.gz tar xzf OpenBLAS-0.3.28.tar.gz cd OpenBLAS-0.3.28/
- Compile
Preferably installed at "/usr/local/" but require sudo accessmake make PREFIX=<OPEN_BLAS_INSTALL_DIRECTORY> install
- Download and unzip
- Update Path
- Update path to OpenMPI and OpenBLAS in .bashrc or .zshrc
export PATH=<OPENMPI_INSTALL_DIRECTORY>/bin:$PATH export LD_LIBRARY_PATH=<OPENMPI_INSTALL_DIRECTORY>/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=<OPEN_BLAS_INSTALL_DIRECTORY>/lib:$LD_LIBRARY_PATH
orsource ~/.bashrc
source ~/.zshrc
- Update path to OpenMPI and OpenBLAS in .bashrc or .zshrc
- Download HPL
- Using wget or curl download from official website:
orwget http://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
curl -O http://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
- Unzip
tar -xf hpl-2.3.tar.gz
1. Compile with CPU
2. Compile with AMD GPU
- Using wget or curl download from official website:
High Performance Linpack with CPU
- Compile HPL
- Copy the template Makefile:
cp setup/Make.Linux_Intel64 Make.Linux_Intel64
- Edit the make file and change following lines:
TOPdir = <hpl-2.3 top folder directory> MPdir = <openmpi file directory> MPinc = -I$(MPdir)/include MPlib = -L$(MPdir)/lib -lmpi LAdir = <openblas file directory> LAinc = -I$(LAdir)/include LAlib = $(LAdir)/lib/libopenblas.a CC = mpicc CCNOOPT = $(HPL_DEFS) CCFLAGS = $(HPL_DEFS) -O3 -w -z noexecstack -z relro -z now -Wall # modify this according to the cpu LINKFLAGS = $(CCFLAGS) $(OMP_DEFS)
- Compile
make arch=Linux_Intel64
- If you want to clean:
make clean arch=Linux_Intel64
- Copy the template Makefile:
- Run HPL
-
Edit the file bin/Linux_Intel64/HPL.dat inside the top folder.
Here is an example with 8GB RAM and 4 Cores CPU:HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 29184 Ns 1 # of NBs 192 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 2 Ps 2 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ##### This line (no. 32) is ignored (it serves as a separator). ###### 0 Number of additional problem sizes for PTRANS 1200 10000 30000 values of N 0 number of additional blocking sizes for PTRANS 40 9 8 13 13 20 16 32 64 values of NB
-
To tune the parameters, can reference the website here. It is not guaranteed to be the optimized setup. Try to tune the parameter by yourself.
The following parameters are probably you need to tune:
- Ps * Qs: the number of cores
- Ns: the problem size
- NBs: the block size
-
Run benchmark
mpirun -np <number of cores> ./xhpl
-
High Performance Linpack with AMD GPU
Prepare (Download the Dockerfile)
Dockerfile for AMD GPU
1. Build Dockerfile
sudo docker build -t rochpl -f Dockerfile .
2. Setup Docker image
- Node A
docker save -o rochpl_image.tar rochpl scp rochpl_image.tar user@10.0.0.12:~
- Node B
docker load -i ~/rochpl_image.tar
- Both nodes
sudo docker run --rm -it \ --device /dev/kfd \ --device /dev/dri \ --security-opt seccomp=unconfined \ --network=host \ --name=rochpl_node \ rochpl /bin/bash
- Setup SSH keys
# Both Nodes ssh-keygen -t rsa -f ~/.ssh/id_rsa -q -N ""
# Both Nodes vim /etc/ssh/sshd_config # change the line --- PasswordAuthentication yes # add this line --- PermitRootLogin yes
# Node A ssh-copy-id -p 2222 root@10.0.0.12
# Node B ssh-copy-id -p 2222 root@10.0.0.14
- Add following to both nodes
vim ~/.ssh/config
Host 10.0.0.14 Port 2222 User root Host 10.0.0.12 Port 2222 User root
- Test if it works
ssh 10.0.0.14 hostname
- Test if it works
3. Run HPL
- Add the rochpl_hostfile on both node
10.0.0.14 slots=4 10.0.0.12 slots=4
- Run HPL using this command (modify the arguments to suit your environment)
export OMPI_MCA_pmix=pmix
mpirun --hostfile rochpl_hostfile -np 8 --bind-to none -x HIP_VISIBLE_DEVICES=0,1,2,3 --mca pml ucx --mca btl ^vader,tcp,openib,uct ./run_rochpl -P 2 -Q 4 -N 256000 --NB 512
ARG UBUNTU_VERSION="jammy"
FROM ubuntu:${UBUNTU_VERSION}
ARG ROCM_URL="https://repo.radeon.com/amdgpu-install/6.1.1/ubuntu/jammy/amdgpu-install_6.1.60101-1_all.deb"
ARG UCX_BRANCH="v1.16.0"
ARG UCC_BRANCH="v1.3.0"
ARG OMPI_BRANCH="v5.0.3"
ARG APT_GET_APPS=""
ARG GPU_TARGET="gfx908,gfx90a,gfx942"
# Update and Install basic Linux development tools
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
ca-certificates \
git \
ssh \
openssh-client \
openssh-server \
make \
vim \
nano \
libtinfo-dev\
initramfs-tools \
libelf-dev \
numactl \
curl \
wget \
tmux \
build-essential \
autoconf \
automake \
libtool \
pkg-config \
libnuma-dev \
gfortran \
flex \
hwloc \
libstdc++-12-dev \
libxml2-dev \
python3-dev \
python3-pip \
python3-distutils \
unzip ${APT_GET_APPS}\
&& apt-get clean
RUN wget -qO- https://repo.radeon.com/rocm/rocm.gpg.key | gpg --dearmor | tee /etc/apt/trusted.gpg.d/rocm.gpg \
&& wget -O rocm.deb ${ROCM_URL} \
&& apt install -y ./rocm.deb \
&& amdgpu-install --usecase=rocm,hiplibsdk --no-dkms -y
RUN bash -c """IFS=',' read -r -a ARCH <<<${GPU_TARGET} \
&& for gpu_arch in \${ARCH[@]}; do \
echo \$gpu_arch >> /opt/rocm/bin/target.lst; \
done""" \
&& chmod a+r /opt/rocm/bin/target.lst
# # Requires cmake > 3.22
RUN mkdir -p /opt/cmake \
&& wget --no-check-certificate --quiet -O - https://cmake.org/files/v3.27/cmake-3.27.7-linux-x86_64.tar.gz | tar --strip-components=1 -xz -C /opt/cmake
ENV ROCM_PATH=/opt/rocm \
UCX_PATH=/opt/ucx \
UCC_PATH=/opt/ucc \
OMPI_PATH=/opt/ompi \
GPU_TARGET=${GPU_TARGET}
# Adding rocm/cmake to the Environment
ENV PATH=$ROCM_PATH/bin:/opt/cmake/bin:$PATH \
LD_LIBRARY_PATH=$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib:$LD_LIBRARY_PATH \
LIBRARY_PATH=$ROCM_PATH/lib:$ROCM_PATH/lib64:$LIBRARY_PATH \
C_INCLUDE_PATH=$ROCM_PATH/include:$C_INCLUDE_PATH \
CPLUS_INCLUDE_PATH=$ROCM_PATH/include:$CPLUS_INCLUDE_PATH \
CMAKE_PREFIX_PATH=$ROCM_PATH/lib/cmake:$CMAKE_PREFIX_PATH
# Create the necessary directory for SSH
RUN mkdir /var/run/sshd
# Set root password for login
RUN echo 'root:redhat' | chpasswd
# Allow root login and password authentication
RUN sed -i 's/#PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config && \
sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
echo "StrictModes no" >> /etc/ssh/sshd_config
# Change the SSH port to 2222
RUN sed -i 's/#Port 22/Port 2222/' /etc/ssh/sshd_config
# Expose the new SSH port
EXPOSE 2222
# Start the SSH service and keep the container running
ENTRYPOINT service ssh restart && bash
WORKDIR /tmp
# Install UCX
RUN git clone https://github.com/openucx/ucx.git -b ${UCX_BRANCH} \
&& cd ucx \
&& ./autogen.sh \
&& mkdir build \
&& cd build \
&& ../contrib/configure-release --prefix=$UCX_PATH \
--with-rocm=$ROCM_PATH \
--without-knem \
--without-xpmem \
--without-cuda \
--enable-optimizations \
--disable-logging \
--disable-debug \
--disable-examples \
&& make -j $(nproc) \
&& make install
# Install UCC
RUN git clone -b ${UCC_BRANCH} https://github.com/openucx/ucc \
&& cd ucc \
&& ./autogen.sh \
&& sed -i 's/memoryType/type/g' ./src/components/mc/rocm/mc_rocm.c \
# offload-arch=native builds the local architecutre, which may not be present at build time for a container.
&& sed -i 's/--offload-arch=native//g' ./cuda_lt.sh \
&& mkdir build \
&& cd build \
&& ../configure --prefix=${UCC_PATH} --with-rocm=${ROCM_PATH} --with-ucx=${UCX_PATH} --with-rccl=no \
&& make -j $(nproc) \
&& make install
# Install OpenMPI
RUN git clone --recursive https://github.com/open-mpi/ompi.git -b ${OMPI_BRANCH} \
&& cd ompi \
&& ./autogen.pl \
&& mkdir build \
&& cd build \
&& ../configure --prefix=$OMPI_PATH --with-ucx=$UCX_PATH \
--with-ucc=${UCC_PATH} \
--enable-mca-no-build=btl-uct \
--without-verbs \
--with-pmix=internal \
--enable-mpi \
--enable-mpi-fortran=yes \
--disable-man-pages \
--disable-debug \
&& make -j $(nproc) \
&& make install
# Adding OpenMPI, UCX, and UCC to Environment
ENV PATH=$OMPI_PATH/bin:$UCX_PATH/bin:$UCC_PATH/bin:$PATH \
LD_LIBRARY_PATH=$OMPI_PATH/lib:$UCX_PATH/lib:$UCC_PATH/lib:$LD_LIBRARY_PATH \
LIBRARY_PATH=$OMPI_PATH/lib:$UCX_PATH/lib:$UCC_PATH/lib:$LIBRARY_PATH \
C_INCLUDE_PATH=$OMPI_PATH/include:$UCX_PATH/include:$UCC_PATH/include:$C_INCLUDE_PATH \
CPLUS_INCLUDE_PATH=$OMPI_PATH/include:$UCX_PATH/include:$UCC_PATH/include:$CPLUS_INCLUDE_PATH \
PKG_CONFIG_PATH=$OMPI_PATH/lib/pkgconfig:$UCX_PATH/lib/pkgconfig/:$PKG_CONFIG_PATH \
OMPI_ALLOW_RUN_AS_ROOT=1 \
OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \
UCX_WARN_UNUSED_ENV_VARS=n
# Install Additional Apps Below
ARG HPL_BRANCH="main"
WORKDIR /opt
# Installing rocHPL
RUN git clone -b ${HPL_BRANCH} https://github.com/ROCmSoftwarePlatform/rocHPL.git \
&& cd rocHPL \
&& ./install.sh \
--prefix=/opt/rochpl \
--with-rocm=/opt/rocm/ \
--with-mpi=/opt/ompi \
&& rm -rf /tmp/rocHPL
ENV PATH=$PATH:/opt/rochpl:/opt/rochpl/bin
ENV HIP_VISIBLE_DEVICES=0,1,2,3
#CMD ["/usr/sbin/sshd", "-D"]
CMD ["/bin/bash"]
Machine Learning
Computer Vision
-
VGG16
- VGG16 has a total of 138 million parameters. The important point to note here is that all the conv kernels are of size 3x3 and maxpool kernels are of size 2x2 with a stride of two.
-
ResNet
- Resnet18 has around 11 million trainable parameters. It consists of CONV layers with filters of size 3x3 (just like VGGNet). Only two pooling layers are used throughout the network one at the beginning and the other at the end of the network. Identity connections are between every two CONV layers. The solid arrows show identity shortcuts where the dimension of the input and output is the same, while the dotted ones present the projection connections where the dimensions differ.
-
Architecture Differences:
- VGG16: VGG16 is a deep convolutional network with a straightforward and uniform architecture, consisting of 16 layers with very small (3x3) convolution filters. It is known for its simplicity and has been a popular choice for image classification tasks.
- ResNet: ResNet, particularly ResNet-50, uses residual connections that help mitigate the vanishing gradient problem, allowing for the training of much deeper networks. ResNet architectures are typically deeper and more complex than VGG16, which generally results in better feature extraction and higher accuracy in many tasks.
-
Performance:
- Accuracy: ResNet models, due to their depth and residual connections, generally outperform VGG16 in many image recognition tasks, including object detection. They are able to learn more complex features and provide better accuracy.
- Computation and Memory: ResNet models are usually more computationally expensive and require more memory compared to VGG16. This can be a consideration if you have limited computational resources.
-
Application in Object Detection:
- Object detection frameworks such as Faster R-CNN, SSD, and YOLO have utilized both VGG and ResNet as backbone feature extractors. In many cases, ResNet-based models have shown better performance in terms of both precision and recall.
- For instance, Faster R-CNN with a ResNet-50 or ResNet-101 backbone generally performs better than the same framework with a VGG16 backbone.
Practical Considerations:
-
ResNet Advantages:
- Better accuracy and feature representation due to deeper network architecture.
- Residual connections help in training deeper networks, resulting in improved performance.
-
VGG16 Advantages:
- Simpler architecture which can be easier to implement and train.
- Less computationally intensive compared to ResNet.
Conclusion:
In general, ResNet models tend to be better than VGG16 for object detection tasks due to their superior feature extraction capabilities and higher accuracy. However, this comes at the cost of increased computational requirements.
If computational resources are not a constraint, it is recommended to use ResNet (e.g., ResNet-50 or ResNet-101) for better performance in object detection. However, if you need a simpler and less resource-intensive model, VGG16 is still a viable option and can achieve good results.