Direct Code Execution Modernization

Student: Parth Pratim Chatterjee

Mentor: Tom Henderson, Vivek Jain, Apoorva Bhargava

Organization: The ns-3 Network Simulator Project
Project Overview

During GSoC 2021, I worked on upgrading DCE (Direct Code Execution), which is a project under the ns-3 umbrella. DCE provides a special capability to the ns-3 network simulator-- while most implementations of protocols in ns-3 are written specifically for the simulation environment (often with abstractions to simplify some details), DCE allows actual implementations of C/C++ applications (and Linux kernel code) to be directly used in the simulation context. This can help to better align simulations with real-world experiments, and can save the ns-3 user from having to rewrite some protocol implementations specifically for ns-3. It supports a wide range of applications, from as simple as the httpd server to complex networks on top of real server-client networking applications. However, DCE was largely developed five to ten years ago, and the original maintainers have not been able to devote as much time recently. As a result, DCE code has atrophied for the past few years, making it harder and harder to use as operating systems, compilers, and Python distributions evolved. At the start of the project, DCE was barely working on an end-of-life Ubuntu distribution (Ubuntu 16.04) and with a Linux kernel version (4.4) released in 2016. Therefore, the goal of this DCE project was to make it work for the most recent Ubuntu LTS release (20.04) and to support a more modern Linux kernel version. As described below, the project succeeded on both goals.

The main barrier to supporting modern Linux releases is with how it reuses the GNU C library (glibc). Historically, DCE was able to reuse the existing system-provided glibc by making use of a technique known as vtable mangling which allows DCE to hijack system calls, such as all stdio calls to a FILE stream, But, in later libc releases, security checks have been introduced on this as well as on loading position-independent-executables (PIE), which is how DCE loads userspace applications like iperf,httpd, etc. into it's own address space, and executes it on top of (Linux/ns-3)'s network stack, based on script configurations. The Linux kernel stack had to be upgraded as well to a Linux release 5.10 or above. The existing Python bindings generation and compilation code was also outdated and wasn't aligned with newer code changes, or with Python 3 and had hard coded binding type/class lookup, reducing flexibility

I worked on developing a custom Glibc based Ubuntu-20.04 build and also aligned the Bake build system to use it, to avoid any changes to the build steps. The custom glibc is built by cloning the libc-2.31 source and patching it to disable the security checks which cause trouble to DCE, in a manner that is completelty automated through Bake. The new glibc built, is of the same version as the system libc, assuring no load-time symbol lookup errors. The Python Bindings support in DCE was also outdated and required an upgrade. I re-implemented the Python bindings compilation code and also the Python API scanning code. I also patched the cpython lookup order to fix library-not-found errors.

The current DCE implementation uses a Linux-4.4.0 based net-next-nuse-4.4.0 Linux library. I developed a Linux library export for net-next-nuse-5.10 using Linux-5.10.47 LTS kernel as the base tag. I managed to get userspace network applications like iperf, netperf httpd, etc, to work on DCE on top of Linux-5.10. I could also get ethtool to work on top of it, and tested device statistics generation and some RX/TX device configurational operations. ethtool is still a work in progress. I also patched and fixed issues with the sysctl Setup, PID namespace, task_struct lookups in kernel initialization phase. I also patched kernel components like the kernel LSM (Linux Security Module), ucounts kernel API, and VFS mount etc. I also tested and prototyped why the previous strategy to use LKL (Linux Kernel Library) fails due to task scheduler issues, as well as limitations imposed by it's uniprocessor architecture, cpu locks and semaphore usage. I summarized my progress regarding the same on the developer lists.

Please read through my design documentation that I maintained during the GSoC program !

Deliverables

This project was split into three phases. Please find the deliverable code written in each of the three phases. All of the below are submitted as merge requests and some of them have already been merged. The remaining are currently being reviewed by mentors and project maintainers. Apart from the ones below, details about other bug fixes, code reviews and testing in which I was involved during GSoC 2021 can be found here

assignment_turned_in

assignment_turned_in

assignment_turned_in

Phase 1
Phase 2
Phase 3
Implementation of net-next-nuse-5.10 with base kernel as Linux-5.10.47

Ubuntu-20.04 alignment and custom glibc based DCE build

Aligned Bake script to automate build and setup procedures
Re-Implemented Python Bindings compilation in DCE

Added --apiscan feature to DCE
Fixed CirleCI

Github Actions CI support

Docker Setup for DCE

Using base kernel as Linux-5.10.47 LTS release, developed the net-next-nuse-5.10 port for DCE.
Patched Linux kernel code for DCE specific use cases (SLAB allocator, PID, Task Scheduler, etc.)
The new Linux library can be loaded into DCE with the same API calls.
Implemented custom glibc based DCE build, patched libc-2.31 to disable security checks and aligned DCE accordingly.
Added Bake build components to the script to automate the build process.

Revived usage of Python Bindings in DCE. Implemented the DCE cpython shared library compilation code
Implemented the --apiscan feature for DCE, which is unimodular in nature and compiles to a standalone project with dependencies on ns-3 libraries as a whole, as compared to ns-3 which is a combination of multiple modules having individually defined dependencies.
The processes uses a combination of PyBindGen and CastXML

Fixed the failing CircleCI used in ns-3-dce.
Drafted a Github Actions based CI, with HEAD based cache restoration
Implemented a docker based setup with almost negligible extra build steps and DCE availability on almost all Linux systems. The environment is highly configurable from both host machine as well as docker. Also includes synced volumes for host changes to project to reflect inside docker.
Makes use of both docker-compose and docker.

Patches

Please find below the Google Drive shared folder which consists of all the patches generated throughout the GSoC 2021 program :

Collaborations and Efforts

Apart from the deliverable code that I produced, I also did the following on the side during the program :

  1. Debugged the Linux Kernel Library (LKL) : Investigated the process scheduler, CPU locks and semaphore issues, and defined technical limitations and details about why LKL was not a good choice at the moment to be integrated with DCE as it's Linux stack.
  2. Opened a thread on the LKL developer lists, with a summary of what is holding us back from using LKL in DCE.
  3. Prepared a prototype on top of LKL, to demonstrate where it failed. Code
  4. Attempted to resolve a couple of issues on the ns-3 official Google Group
  5. Commit for adding configure_arguments attribute usage in depends_on tag itself, and fixing dependency lookups in Bake. The commits were Merged in the Bake Repo. Bake Commit
  6. Learned some really great debugging skills while my mentor resolved the bug in dce-umip-nemo.cc
  7. Collaborated with my mentor on resolving a bug with LTE scripts reported by a user.
  8. Implemented (testing is yet to be done) ethtool on top of Linux-5.10.47 ported kernel library, to facilitate netdevice statistics and user-defined device configurations. Code
  9. Debugged the dce-iperf.cc script, and documented technical reasons and point of failure in the code. Details
  10. Participated in the Code Review process of a PR by a developer which features a stdio hijack implementation on top of fopencookie. PR #128
  11. Suggested fixes and a execve based edge case where the FILE stream was getting popped off the streams stack and no log was being flushed to the files
Steps to build the code for Phase 1
    Test Custom Glibc Build on Ubuntu-20.04
  1. Download bake from above provided link
  2. Configure Bake: ./bake.py configure -e dce-linux-dev
  3. Build : ./bake.py build
  4. To run the test suite: ./test.py
    Test net-next-nuse-5.10
  1. Download net-next-nuse-5.10 from above provided link
  2. Configure: make defconfig ARCH=LIB
  3. Build : make library ARCH=LIB
  4. Copy : arch/lib/tools/libsim-linux.5.10.so to bake/build/liblinux.so
Steps to build the code for Phase 2
  1. Download ns-3-dce from above provided link
  2. Configure (from bake): ./bake.py configure -e dce-linux-dev
  3. Build ns-3-dev: ./bake.py build
  4. To run apiscan: ./waf --apiscan
  5. To run Python script: ./waf --pyrun "location/of/python/script.py"
Steps to build the code for Phase 3
  1. To Build the Docker image, please follow the steps provided in README to reproduce the results.
Possible Extensions

This project can be extended further as suggested below:

  1. Support for Google BBR v2 kernel could be added.
  2. Testing and network simulation using scripts like flent
Online References

More details about this project are available at the project wiki page link given below: