写点什么

极客时间运维进阶训练营第一周作业

作者:9527
  • 2022-10-22
    美国
  • 本文字数:7971 字

    阅读完需:约 26 分钟

Namespace

What is Namespace

We all know that the operating system uses virtual memory technology to make each user process think that it has all the physical memory, which is the virtualization of the memory by the operating system. 

Also, through the time-sharing scheduling system, each process can be scheduled and executed fairly, that is, each process can obtain the CPU, so that each process thinks that it has all the CPU time during the process activity, which is the virtualization of the CPU by the operating system.

However, the above two virtualization technologies only virtualize the “physical resources”. In fact, on a host, there are many “non-physical resources” in the operating system, such as user permission, network protocol stack resources, and file system mount path resources. Through the namespace feature of Linux, these non-physical global resources can be virtualized.

According to wiki, Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. 

The feature works by assigning the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Namespaces are a fundamental aspect of containers on Linux.

How Namespace Works

Namespace is the underlying concept of the Linux system. It is implemented at the kernel layer. 

  • Each container runs in the same container runtime process and shares the same host system kernel. 

  • Each container must have an isolated running space similar to a virtual machine, but the container technology implements a running environment for running specified services within a process, and can also protect the host kernel from the interference and influence of other processes, such as file system space, network space, process space, etc. 

  • Namespaces are currently mainly isolated from each other in the container running space through the following technologies:


Common Namespaces Used in Containers

MNT Namespace

MNT namespace provides isolation of disk mount points and file systems:

  • Each container must have its own independent root file system and an independent user space

  • It providing the opportunity for different processes on a system to have different views of the host’s filesystem


A new mount namespace is created using either clone or unshare syscalls with the CLONE_NEWNS flag. When a new mount namespace is created, its mount list is initialized as follows:

  • If the namespace is created using clone, the mount list of the child’s namespace is a copy of the mount list in the parent process’s mount namespace.

  • If the namespace is created using unshare, the mount list of the new namespace is a copy of the mount list in the caller’s previous mount namespace.

Demo

Run a shell in a new mount namespace

$ unshare --mount
复制代码

Create a /tmp/mnt directory

$ mkdir /tmp/mnt$ ls /tmp/mnt
复制代码

Mount /usr/local/bin 

$ mount --bind /usr/local/bin /tmp/mnt/$ ls /tmp/mnt/2to3-3.10  idle3.10  pip3.10  pydoc3.10  python3.10  python3.10-config$ findmnt  | grep "tmp/mnt"└─/tmp/mnt                            /dev/xvda1[/usr/local/bin]                         xfs        rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota
复制代码

Exit

$ exit$ ls /tmp/mnt/pydoc3.10ls: cannot access /tmp/mnt/pydoc3.10: No such file or directory
复制代码

IPC Namespace

The IPC namespace is used for isolating System V IPC objects, and POSIX message queues. The clone flag used to achieve this is CLONE_NEWIPC. Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem. 

Objects created in an IPC namespace are visible to all other processes that are members of that namespace, but are not visible to processes in other IPC namespaces.

Demo

List current IPC namespace

$ lsns | grep ipc4026531839 ipc       112     1 root   /usr/lib/systemd/systemd --switched-root --system --deserialize 21
复制代码

Create IPC namespace:

$ unshare --ipc$ lsns | grep ipc4026531839 ipc       110     1 root   /usr/lib/systemd/systemd --switched-root --system --deserialize 214026532281 ipc         3 29447 root   -bash
复制代码

UTS Namespace

The UTS namespace is used to isolate two specific elements of the system that relate to the uname system call. The UTS(UNIX Time Sharing) namespace is named after the data structure used to store information returned by the uname system call. 

Specifically, the UTS namespace isolates the hostname and the NIS domain name. It enables a container to have its own hostname, which is independent of the host system and other containers on it.


Demo

Create UTS namespace:

$ unshare --fork --mount --uts /bin/bash
复制代码


# Re-associate socket to new namespace$ mount -t tmpfs tmpfs /run 
复制代码


$ hostnamectl set-hostname test.comFailed to create bus connection: No such file or directory$ hostanmecloud-dev.com$ hostname test.com$ hostnametest.com
复制代码

Now if you log into the server with a new terminal:

$ ssh cloud-dev$ hostnamecloud-dev.com
复制代码

PID Namespace

PID namespaces isolate the process ID number space, meaning that processes in different PID namespaces can have the same PID. PID namespaces allow containers to provide functionality such as suspending/resuming the set of processes in the container and migrating the container to a new host while the processes inside the container maintain the same PIDs.


Demo

Check current PID

$ echo $$28877
复制代码

As you can see, the current Bash shell has PID 28877, let’s create a new PID namespace.

$ unshare --pid -f$ echo $$1$ lsns | grep pid4026531836 pid       110     1 root   /usr/lib/systemd/systemd --switched-root --system --deserialize 214026532281 pid         3 29479 root   -bash
复制代码

Net Namespace

Network namespaces can virtualize network stacks, and each network namespace has its own resources, such as network interfaces, IP addresses, routing tables, tunnels, firewalls, etc. For example, rules added to a network namespace by iptables will only affect traffic entering and leaving that namespace.

Each container is similar to a virtual machine. It has its own network card, listening port, TCP/IP protocol stack, etc. For example, Docker runtime uses the network namespace to start a vethX interface, so that your container will have its own bridge IP address, usually docker0, and docker0 is essentially a Linux virtual network bridge.


Demo

Create NET namespace

unshare --net=/var/run/netns/testns
复制代码

List current NET namespace

$ lsns | grep net4026532088 net       113     1 root   /usr/lib/systemd/systemd --switched-root --system --deserialize 214026532282 net         3 30639 root   -bash
复制代码


$ ip netnstestns
复制代码

Check new namespace interface

$ ip netns exec testns ip addr1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
复制代码

User Namespace

User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs, the root directory, keys, and capabilities. A process’s user and group IDs can be different inside and outside a user namespace. 

In particular, a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace; in other words, the process has full privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace.


Demo

Check current USER namespace

$ lsns | grep user4026531837 user      116     1 root   /usr/lib/systemd/systemd --switched-root --system --deserialize 21
复制代码

Create a new USER namespace and attach it

$ unshare --map-root-user --user --fork$ lsns | grep user4026531835 cgroup      4 30413 root unshare --map-root-user --user --fork4026531836 pid         4 30413 root unshare --map-root-user --user --fork4026531838 uts         4 30413 root unshare --map-root-user --user --fork4026531839 ipc         4 30413 root unshare --map-root-user --user --fork4026531840 mnt         4 30413 root unshare --map-root-user --user --fork4026532088 net         4 30413 root unshare --map-root-user --user --fork4026532281 user        4 30413 root unshare --map-root-user --user --fork
复制代码

Try to update hostname as the new root user

$ iduid=0(root) gid=0(root) groups=0(root)$ hostname test.comhostname: you must be root to change the host name
复制代码


Install Docker

The procedure to install Docker on AMI 2 (Amazon Linux 2) running on either EC2 or Lightsail instance is as follows:

  • Login into remote AWS server using the ssh command:

$ ssh ec2-user@ec2-ip-address-dns-name-here

  • Apply pending updates using the yum command:

$ sudo yum update

  • Search for Docker package:

$ sudo yum search docker

  • Get version information:

$ sudo yum info docker

  • Install docker

sudo yum install docker

  • Enable docker service at AMI boot time:

$ sudo systemctl enable docker.service

  • Start the Docker service

sudo systemctl start docker.service

  • Check docker status

docker.service - Docker Application Container Engine   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)   Active: active (running) since Wed 2022-10-17 05:03:52 EDT; 18s ago     Docs: https://docs.docker.com  Process: 3295 ExecStartPre=/usr/libexec/docker/docker-setup-runtimes.sh (code=exited, status=0/SUCCESS)  Process: 3289 ExecStartPre=/bin/mkdir -p /run/docker (code=exited, status=0/SUCCESS) Main PID: 3312 (dockerd)    Tasks: 9   Memory: 39.9M   CGroup: /system.slice/docker.service           └─3312 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/c...
Sep 08 05:03:51 amazon.example.local dockerd[3312]: time="2022-10-17T05:03...Sep 08 05:03:51 amazon.example.local dockerd[3312]: time="2021-10-17T05:03...
复制代码


Docker volume

Bind mounts

Bind mounts have been around since the early days of Docker. Bind mounts have limited functionality compared to volumes. When you use a bind mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its absolute path on the host machine. By contrast, when you use a volume, a new directory is created within Docker’s storage directory on the host machine, and Docker manages that directory’s contents.

The file or directory does not need to exist on the Docker host already. It is created on demand if it does not yet exist. Bind mounts are very performant, but they rely on the host machine’s filesystem having a specific directory structure available.

Example:

$ docker run -d -it --name nginx --mount type=bind,source="$(pwd)"/target,target=/app nginx:latest
$ docker inspect nginx | jq .[0]."Mounts"[ { "Type": "bind", "Source": "/root/target", "Destination": "/app", "Mode": "", "RW": true, "Propagation": "rprivate" }]

复制代码

You can use the following command to do ReadOnly" bind mount:

$ docker run -d -it --name nginx --mount type=bind,source="$(pwd)"/target,target=/app,readonly nginx:latest
复制代码

Volumes

In containers, volumes are preferred mechanism for persisting data. While bind mounts are dependent on the directory structure and OS of the host machine, volumes are completely managed by Docker.

In addition, volumes are often a better choice than persisting data in a container’s writable layer, because a volume does not increase the size of the containers using it, and the volume’s contents exist outside the lifecycle of a given container.

Create volume

$  docker volume create my-volmy-vol
$ docker volume listDRIVER VOLUME NAMElocal my-vol
$ docker volume inspect my-vol[ { "CreatedAt": "2022-10-21T13:33:36-04:00", "Driver": "local", "Labels": {}, "Mountpoint": "/var/lib/docker/volumes/my-vol/_data", "Name": "my-vol", "Options": {}, "Scope": "local" }]
复制代码

Remove volume

$ docker volume rm my-vol
复制代码

Mount volume

$ docker run -d --name nginx --mount source=my-vol,target=/app nginx:latest5d0a2740b83cde52c0a167ae9bb4a80ae655569b36906a040477417258d0cf2f
$ docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES5d0a2740b83c nginx:latest "/docker-entrypoint.…" 42 seconds ago Up 42 seconds 80/tcp nginx
$ docker inspect nginx | jq .[0]."Mounts"[ { "Type": "volume", "Name": "my-vol", "Source": "/var/lib/docker/volumes/my-vol/_data", "Destination": "/app", "Driver": "local", "Mode": "z", "RW": true, "Propagation": "" }]

复制代码

Docker network

Bridge network

In terms of networking, a bridge network is a Link Layer device which forwards traffic between network segments. A bridge can be a hardware device or a software device running within a host machine’s kernel.

In terms of Docker, a bridge network uses a software bridge which allows containers connected to the same bridge network to communicate, while providing isolation from containers which are not connected to that bridge network. The Docker bridge driver automatically installs rules in the host machine so that containers on different bridge networks cannot communicate directly with each other.

Create a bridge

$ docker network lsNETWORK ID     NAME      DRIVER    SCOPEe0265059dda5   bridge    bridge    local2c2542ac2d5a   host      host      local5d2271652a97   none      null      local$ docker network create my-net1fb45cdb7fc5968c2764f41961376d1bff7f2e020cd4f99607ed0b83d4fd8544$ docker network lsNETWORK ID     NAME      DRIVER    SCOPEe0265059dda5   bridge    bridge    local2c2542ac2d5a   host      host      local1fb45cdb7fc5   my-net    bridge    local5d2271652a97   none      null      local
复制代码

Use bridge

$ docker create --name nginx --network my-net --publish 8080:80 nginx:latest143edeaa0c5192ebacbe8858827c99034a4d99f8efc64ed5ad7db203237dfeaa
docker start 143edeaa0c51143edeaa0c51$ docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES143edeaa0c51 nginx:latest "/docker-entrypoint.…" 46 seconds ago Up 2 seconds 0.0.0.0:8080->80/tcp, :::8080->80/tcp nginx
$ curl http://127.0.0.1:8080<!DOCTYPE html><html><head><title>Welcome to nginx!</title><style>html { color-scheme: light dark; }body { width: 35em; margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif; }</style></head><body><h1>Welcome to nginx!</h1><p>If you see this page, the nginx web server is successfully installed andworking. Further configuration is required.</p>
<p>For online documentation and support please refer to<a href="http://nginx.org/">nginx.org</a>.<br/>Commercial support is available at<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p></body></html>
复制代码

Container network

# Check nginx container network info$ docker exec -it nginx ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo       valid_lft forever preferred_lft forever10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0    inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0       valid_lft forever preferred_lft forever
$ docker run -it --net container:nginx busyboxUnable to find image 'busybox:latest' locallylatest: Pulling from library/busyboxf5b7ce95afea: Pull completeDigest: sha256:9810966b5f712084ea05bf28fc8ba2c8fb110baa2531a10e2da52c1efc504698Status: Downloaded newer image for busybox:latest
/ # ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0 valid_lft forever preferred_lft forever
复制代码


用户头像

9527

关注

还未添加个人签名 2020-04-22 加入

还未添加个人简介

评论

发布
暂无评论
极客时间运维进阶训练营第一周作业_9527_InfoQ写作社区