[systemd-devel] Automatic generation of a part of a service unit file

Topi Miettinen toiwoton at gmail.com
Sun Aug 7 21:42:31 UTC 2016


Hello,

I made a small systemtap script which can generate a part of
configuration for a systemd service. When run, first it produces
strace-like output which is annotated with information gathered from
various kernel probes. When the process exits, a summary is generated in
systemd unit format. The purpose of the script is to help system
administrators, distro maintainers and program developers to prepare
better unit files.

The systemtap probes check for:
 * capability use for CapabilityBoundingSet
 * device access for DeviceAllow
 * address family use for RestrictAddressFamilies
 * RLIMIT related information

Also file system accesses are checked against ProtectSystem/ProtectHome
requirements and mmap()/mprotect() flags against what is needed by
MemoryDenyWriteExecute=true. For example, if the process never writes to
/boot, /etc or /usr, we can set ProtectSystem=full, but if the script
detects a write access, this is degraded to ProtectSystem=true or
ProtectSystem=false. For InaccessibleDirectories, the user should
specify the list of candidate paths which can be made inaccessible. If
there's any FS access to those paths, they are dropped from the list,
otherwise the remaining paths are proposed as inaccessible.

A list of system calls is produced for SystemCallFilter.

When in doubt, the strace part can be used to verify how the summary was
produced.

Short sample output with wpa_supplicant (looks better on a wide terminal
without line wrapping):
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0) = 4
[RestrictAddressFamilies=AF_UNIX] [NOFILE 3 -> 4]
socket(PF_NETLINK, SOCK_RAW, 0) = 5 [RestrictAddressFamilies=AF_NETLINK]
connect(4, {AF_UNIX, "/var/run/dbus/system_bus_socket"}, 33) = 0
[ReadWriteDirectories=/run/dbus/system_bus_socket ]
[InaccessibleDirectories=~/ /run /run/dbus /run/dbus/system_bus_socket
/var ]
open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 14
[DeviceAllow=/dev/char/1:9 r ] [NOFILE 13 -> 14]
[InaccessibleDirectories=~/ /dev /dev/urandom ]

Summary:
CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_RAW
# Consider also possibly missing CapabilityBoundingSet=CAP_SYS_ADMIN
ProtectHome=true
ProtectSystem=full
DevicePolicy=strict
DeviceAllow=/dev/char/1:3 rw
DeviceAllow=/dev/char/1:8 r
DeviceAllow=/dev/char/1:9 r
DeviceAllow=/dev/char/10:58 r
# LimitFSIZE=0
# LimitDATA=577536
# LimitSTACK=139264
# LimitCORE=0
# LimitNOFILE=15
# LimitAS=45146112
# LimitNPROC=159
# LimitMEMLOCK=0
# LimitSIGPENDING=0
# LimitMSGQUEUE=0
# LimitNICE=0
# LimitRTPRIO=0
RestrictAddressFamilies=AF_UNIX AF_INET AF_NETLINK AF_PACKET
MemoryDenyWriteExecute=true
SystemCallFilter=access alarm arch_prctl bind brk chmod clock_getres
clock_gettime close connect execve exit_group fcntl fstat geteuid
getresgid getresuid getrlimit getsockname getuid ioctl mkdir mmap
mprotect munmap open poll read recvfrom recvmsg rmdir rt_sigaction
rt_sigprocmask rt_sigreturn rt_sigsuspend select sendmsg sendto
set_robust_list set_tid_address setsockopt socket statfs unlink write
InaccessibleDirectories=-/bin
InaccessibleDirectories=-/boot
InaccessibleDirectories=-/dev/hugepages
InaccessibleDirectories=-/dev/mqueue
InaccessibleDirectories=-/dev/pts
InaccessibleDirectories=-/dev/shm
InaccessibleDirectories=-/home
InaccessibleDirectories=-/lost+found
InaccessibleDirectories=-/media
InaccessibleDirectories=-/mnt
InaccessibleDirectories=-/opt
InaccessibleDirectories=-/proc/bus
InaccessibleDirectories=-/proc/sys
InaccessibleDirectories=-/root
InaccessibleDirectories=-/srv
InaccessibleDirectories=-/tmp
InaccessibleDirectories=-/usr/bin
InaccessibleDirectories=-/usr/sbin
InaccessibleDirectories=-/var/tmp
ReadOnlyDirectories=/
ReadWriteDirectories=/dev/null /run /run/dbus/system_bus_socket
/run/wpa_supplicant /run/wpa_supplicant/wlan0 socket:[38833]

This is pretty much valid (with some editing, for example last line only
really needs /run/wpa_supplicant) for my system, even if the script is
not yet perfect. I do not trust the RLIMIT values yet and systemtap
itself causes problems (needs to be run as root, system call names don't
match seccomp). Perhaps there should be a way for systemd to start the
process directly without staprun in between and then tell staprun about
the process.

I suppose the script could find a home either with systemd repository
(as it's fairly specific to systemd), systemtap (it's just another
script) or just somewhere in github if nobody cares. Would it be
interesting for systemd?

For future features, it may be possible to probe what kind of settings
for NoNewPrivileges or SecureBits could be used. This could need small
changes to kernel. PrivateTmp and PrivateNetwork may be possible to be
generated in some cases, MountFlags probably not.

-Topi
-------------- next part --------------
#! /bin/sh

# suppress some run-time errors here for cleaner output
//bin/true && exec stap --suppress-handler-errors --skip-badvars $0 ${1+"$@"}

/*
 * Compile:
 * stap -p4 -DSTP_NO_OVERLOAD -m strace
 * Run:
 * /usr/bin/staprun -R -c "/sbin/wpa_supplicant -u -O /run/wpa_supplicant -c /etc/wpa_supplicant.conf -i wlan0" -w /root/strace.ko only_capability_use=1 timestamp=0
 */

/* configuration options; set these with stap -G */
global follow_fork = 0   /* -Gfollow_fork=1 means trace descendant processes too */
global timestamp = 1     /* -Gtimestamp=0 means don't print a syscall timestamp */
global elapsed_time = 0  /* -Gelapsed_time=1 means print a syscall duration too */
global only_capability_use = 0 /* -Gonly_capability_use=1 means print only when capabilities are used */
global inaccessible_candidates = "/bin /boot /dev /dev/hugepages /dev/mqueue /dev/pts /dev/shm /home /lost+found /media /mnt /opt /proc /proc/bus /proc/sys /root /sbin /srv /sys /sys/fs /usr/bin /usr/sbin /tmp /var /var/tmp"

global thread_argstr%
global thread_time%

global syscalls_nonreturn[2]
global capnames[64]
global used_caps
global missing_caps
global all_used_caps
global all_missing_caps
global accessed_devices[1000]
global all_accessed_devices[1000]
global highwatermark_fsize
global highwatermark_data
global highwatermark_stack
global highwatermark_core
global highwatermark_nproc
global highwatermark_nofile
global highwatermark_memlock
global highwatermark_as
global highwatermark_sigpending
global highwatermark_msgqueue
global highwatermark_nice
global highwatermark_rtprio
global old_highwatermark_fsize
global old_highwatermark_data
global old_highwatermark_stack
global old_highwatermark_core
global old_highwatermark_nproc
global old_highwatermark_nofile
global old_highwatermark_memlock
global old_highwatermark_as
global old_highwatermark_sigpending
global old_highwatermark_msgqueue
global old_highwatermark_nice
global old_highwatermark_rtprio
global afnames%
global used_afs
global missing_afs
global all_used_afs
global all_missing_afs
global no_memory_deny_write_execute
global all_memory_deny_write_execute = "true"
global used_syscalls%
global syscalls_for_seccomp%
global accessed_paths%
global all_accessed_paths%
global written_paths%
global all_written_paths%
global inaccessibles%
global protect_system_paths%
global protect_system = "full"
global protect_home_paths%
global protect_home = "true"
global print_syscall


probe begin 
  {
    /* list those syscalls that never .return */
    syscalls_nonreturn["exit"]=1
    syscalls_nonreturn["exit_group"]=1

    // egrep '#define CAP_.*[0-9]+$' /usr/src/linux-headers*/include/uapi/linux/capability.h | awk '{ print "capnames[" $3 "] = \"" $2 "\";" }'
    capnames[0] = "CAP_CHOWN";
    capnames[1] = "CAP_DAC_OVERRIDE";
    capnames[2] = "CAP_DAC_READ_SEARCH";
    capnames[3] = "CAP_FOWNER";
    capnames[4] = "CAP_FSETID";
    capnames[5] = "CAP_KILL";
    capnames[6] = "CAP_SETGID";
    capnames[7] = "CAP_SETUID";
    capnames[8] = "CAP_SETPCAP";
    capnames[9] = "CAP_LINUX_IMMUTABLE";
    capnames[10] = "CAP_NET_BIND_SERVICE";
    capnames[11] = "CAP_NET_BROADCAST";
    capnames[12] = "CAP_NET_ADMIN";
    capnames[13] = "CAP_NET_RAW";
    capnames[14] = "CAP_IPC_LOCK";
    capnames[15] = "CAP_IPC_OWNER";
    capnames[16] = "CAP_SYS_MODULE";
    capnames[17] = "CAP_SYS_RAWIO";
    capnames[18] = "CAP_SYS_CHROOT";
    capnames[19] = "CAP_SYS_PTRACE";
    capnames[20] = "CAP_SYS_PACCT";
    capnames[21] = "CAP_SYS_ADMIN";
    capnames[22] = "CAP_SYS_BOOT";
    capnames[23] = "CAP_SYS_NICE";
    capnames[24] = "CAP_SYS_RESOURCE";
    capnames[25] = "CAP_SYS_TIME";
    capnames[26] = "CAP_SYS_TTY_CONFIG";
    capnames[27] = "CAP_MKNOD";
    capnames[28] = "CAP_LEASE";
    capnames[29] = "CAP_AUDIT_WRITE";
    capnames[30] = "CAP_AUDIT_CONTROL";
    capnames[31] = "CAP_SETFCAP";
    capnames[32] = "CAP_MAC_OVERRIDE";
    capnames[33] = "CAP_MAC_ADMIN";
    capnames[34] = "CAP_SYSLOG";
    capnames[35] = "CAP_WAKE_ALARM";
    capnames[36] = "CAP_BLOCK_SUSPEND";
    capnames[37] = "CAP_AUDIT_READ";

    //egrep '#define AF_.*' /usr/src/linux-headers-*/include/linux/socket.h | awk '{ print "afnames[" $3 "] = \"" $2 "\"" }'
    afnames[0] = "AF_UNSPEC"
    afnames[1] = "AF_UNIX"
    afnames[2] = "AF_INET"
    afnames[3] = "AF_AX25"
    afnames[4] = "AF_IPX"
    afnames[5] = "AF_APPLETALK"
    afnames[6] = "AF_NETROM"
    afnames[7] = "AF_BRIDGE"
    afnames[8] = "AF_ATMPVC"
    afnames[9] = "AF_X25"
    afnames[10] = "AF_INET6"
    afnames[11] = "AF_ROSE"
    afnames[12] = "AF_DECnet"
    afnames[13] = "AF_NETBEUI"
    afnames[14] = "AF_SECURITY"
    afnames[15] = "AF_KEY"
    afnames[16] = "AF_NETLINK"
    afnames[17] = "AF_PACKET"
    afnames[18] = "AF_ASH"
    afnames[19] = "AF_ECONET"
    afnames[20] = "AF_ATMSVC"
    afnames[21] = "AF_RDS"
    afnames[22] = "AF_SNA"
    afnames[23] = "AF_IRDA"
    afnames[24] = "AF_PPPOX"
    afnames[25] = "AF_WANPIPE"
    afnames[26] = "AF_LLC"
    afnames[27] = "AF_IB"
    afnames[28] = "AF_MPLS"
    afnames[29] = "AF_CAN"
    afnames[30] = "AF_TIPC"
    afnames[31] = "AF_BLUETOOTH"
    afnames[32] = "AF_IUCV"
    afnames[33] = "AF_RXRPC"
    afnames[34] = "AF_ISDN"
    afnames[35] = "AF_PHONET"
    afnames[36] = "AF_IEEE802154"
    afnames[37] = "AF_CAIF"
    afnames[38] = "AF_ALG"
    afnames[39] = "AF_NFC"
    afnames[40] = "AF_VSOCK"
    afnames[41] = "AF_KCM"

    syscalls_for_seccomp["fstatat"] = "fstatat64"
    syscalls_for_seccomp["mmap2"] = "mmap"
    syscalls_for_seccomp["pread"] = "pread64"
    syscalls_for_seccomp["pwrite"] = "pwrite64"

    str = tokenize(inaccessible_candidates, " ")
    while (str != "") {
      inaccessibles[str] = 0
      str = tokenize("", " ")
    }

    protect_system_paths["/boot"] = 1
    protect_system_paths["/etc"] = 1
    protect_system_paths["/usr"] = 1
    # Additional ProtectSystem directories in Debian
    protect_system_paths["/bin"] = 1
    protect_system_paths["/lib"] = 1
    protect_system_paths["/lib64"] = 1
    protect_system_paths["/sbin"] = 1

    protect_home_paths["/home"] = 1
    protect_home_paths["/root"] = 1
    protect_home_paths["/run/user"] = 1
 }



function filter_p()
  {
    if (target() == 0) return 0; /* system-wide */
    if (!follow_fork && pid() != target()) return 1; /* single-process */
    if (follow_fork && !target_set_pid(pid())) return 1; /* multi-process */
    return 0;
  }

function caps_to_str(caps)
  {
    str = ""
    for (i = 0; i < 37; i++) # CAP_LAST_CAP
      if (caps & (1 << i)) {
        str .= capnames[i]
	if ((caps & ~((1 << (i + 1)) - 1)) != 0)
	  str .= " "
      }
    return str
  }

function dev_to_str(type, dev, access)
  {
    devs = "/dev/"
    if (type == 1) # DEV_BLOCK
      devs .= "block"
    else
      devs .= "char"
    devs .= sprintf("/%d:%d ", dev >> 32, dev & 0xffffffff)
    if (access & 2) # ACC_READ
      devs .= "r"
    if (access & 4) # ACC_WRITE
      devs .= "w"
    if (access & 1) # ACC_MKNOD
      devs .= "m"
    return devs
  }

function afs_to_str(afs)
  {
    str = ""
    for (i = 0; i < 42; i++) # MAX_AF
      if (afs & (1 << i)) {
        str .= afnames[i]
	if ((afs & ~((1 << (i + 1)) - 1)) != 0)
	  str .= " "
      }
    return str
  }

/* Capabilities */
probe kernel.function("cap_capable at security/commoncap.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $audit)
      used_caps |= 1 << $cap;
    else
      missing_caps |= 1 << $cap;
  }

/* Devices */
probe kernel.function("__devcgroup_check_permission at security/device_cgroup.c").return
  {
    if (filter_p()) next;

    if ($return == 0)
      accessed_devices[$type, $major << 32 | $minor] |= $access
  }

/* RLIMIT_FSIZE */
probe kernel.function("inode_newsize_ok at fs/attr.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && highwatermark_fsize < $offset)
      highwatermark_fsize = $offset
  }

/* RLIMIT_DATA */
probe kernel.function("prctl_set_mm at kernel/sys.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && highwatermark_data < $prctl_map->end_data - $prctl_map->start_data) {
      highwatermark_data = $prctl_map->end_data - $prctl_map->start_data
      print_syscall = 1
    }
  }

probe kernel.function("do_brk at mm/mmap.c").return
  {
    if (filter_p()) next;

    task = task_current()
    if ($return > 0 && highwatermark_data < task->mm->data_vm << 12) { # PAGE_SHIFT
      highwatermark_data = task->mm->data_vm << 12
      print_syscall = 1
    }
    if ($return > 0 && highwatermark_as < task->mm->total_vm << 12) {
      highwatermark_as = task->mm->total_vm << 12
      print_syscall = 1
    }
  }

/* also RLIMIT_STACK and RLIMIT_MEMLOCK */
probe kernel.function("vm_stat_account at mm/mmap.c").return
  {
    if (filter_p()) next;

    if (highwatermark_data < $mm->data_vm << 12) { # PAGE_SHIFT
      highwatermark_data = $mm->data_vm << 12
      print_syscall = 1
    }
    if (highwatermark_stack < $mm->stack_vm << 12) {
      highwatermark_stack = $mm->stack_vm << 12
      print_syscall = 1
    }
    if (highwatermark_memlock < atomic_long_read(&$mm->locked_vm) << 12) {
      highwatermark_memlock = atomic_long_read(&$mm->locked_vm) << 12
      print_syscall = 1
    }
    if (highwatermark_as < $mm->total_vm << 12) {
      highwatermark_as = $mm->total_vm << 12
      print_syscall = 1
    }
  }

/* RLIMIT_CORE */
probe kernel.function("dump_emit at fs/coredump.c").return
  {
    if (filter_p()) next;

    if (highwatermark_core < $cprm->written) {
      highwatermark_core = $cprm->written
      print_syscall = 1
    }
  }

/* RLIMIT_NPROC */
probe kernel.function("commit_creds at kernel/cred.c").return
  {
    if (filter_p()) next;

    if (highwatermark_nproc < atomic_read(&$new->user->processes)) {
      highwatermark_nproc = atomic_read(&$new->user->processes)
      print_syscall = 1
    }
  }

probe kernel.function("copy_process at kernel/fork.c").return
  {
    if (filter_p()) next;
    printf("return %d\n", $return);
    try {
    if (($return > 0 || $return < -1000) && $return->real_cred && $return->real_cred->user)
      printf("good return %d\n", $return);
      if (highwatermark_nproc < atomic_read(&$return->real_cred->user->processes)) {
	highwatermark_nproc = atomic_read(&$return->real_cred->user->processes)
	print_syscall = 1
      }
    } catch {}
  }

/* RLIMIT_NOFILE */
probe kernel.function("__alloc_fd at fs/file.c").return
  {
    if (filter_p()) next;

    if (($return >= 0 || $return < -1000) && highwatermark_nofile < $return) {
      highwatermark_nofile = $return
      print_syscall = 1
    }
  }

probe kernel.function("do_dup2 at fs/file.c").return
  {
    if (filter_p()) next;

    if (($return >= 0 || $return < -1000) && highwatermark_nofile < $return) {
      highwatermark_nofile = $return
      print_syscall = 1
    }
  }

/* RLIMIT_MEMLOCK */
probe kernel.function("sys_bpf at kernel/bpf/syscall.c").return
  {
    if (filter_p()) next;

    task = task_current()
    user = task->real_cred->user
    if ($return == 0 && highwatermark_memlock < atomic_long_read(&user->locked_vm) << 12) { # PAGE_SHIFT
      highwatermark_memlock = atomic_long_read(&user->locked_vm) << 12
      print_syscall = 1
    }
  }

probe kernel.function("perf_mmap at kernel/events/core.c").return
  {
    if (filter_p()) next;

    task = task_current()
    if ($return == 0 && highwatermark_memlock < task->mm->pinned_vm << 12) { # PAGE_SHIFT
      highwatermark_memlock = task->mm->pinned_vm << 12
      print_syscall = 1
    }
  }

probe kernel.function("do_mlock at mm/mlock.c").return
  {
    if (filter_p()) next;

    task = task_current()
    if ($return == 0 && highwatermark_memlock < task->mm->locked_vm << 12) { # PAGE_SHIFT
      highwatermark_memlock = task->mm->locked_vm << 12
      print_syscall = 1
    }
  }

probe kernel.function("sys_mlockall at mm/mlock.c").return
  {
    if (filter_p()) next;

    task = task_current()
    if ($return == 0 && highwatermark_memlock < task->mm->total_vm << 12) { # PAGE_SHIFT
      highwatermark_memlock = task->mm->total_vm << 12
      print_syscall = 1
    }
  }

/* RLIMIT_SIGPENDING */
probe kernel.function("__sigqueue_alloc at kernel/signal.c").return
  {
    if (filter_p()) next;

    task = task_current()
    user = task->real_cred->user
    if ($return == 0 && highwatermark_sigpending < atomic_read(&user->sigpending)) {
      highwatermark_sigpending = atomic_read(&user->sigpending)
      print_syscall = 1
    }
  }

/* RLIMIT_MSGGQUEUE */
probe kernel.function("mqueue_get_inode at ipc/mqueue.c").return
  {
    if (filter_p()) next;

    task = task_current()
    user = task->real_cred->user
    if ($return == 0 && highwatermark_msgqueue < user->mq_bytes) {
      highwatermark_msgqueue = user->mq_bytes
      print_syscall = 1
    }
  }

/* RLIMIT_NICE */
probe kernel.function("set_user_nice at kernel/sched/core.c").return
  {
    if (filter_p()) next;

    if (highwatermark_nice < $nice) {
      highwatermark_nice = $nice
      print_syscall = 1
    }
  }

/* RLIMIT_RTPRIO */
probe kernel.function("__sched_setscheduler at kernel/sched/core.c").return
  {
    if (filter_p()) next;

    if (highwatermark_rtprio < $attr->sched_priority) {
      highwatermark_rtprio = $attr->sched_priority
      print_syscall = 1
    }
  }

/* socket address families */
probe kernel.function("__sock_create at net/socket.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      used_afs |= 1 << $family
      print_syscall = 1
    } else if ($return == 93) { # EPROTONOSUPPORT
      missing_afs |= 1 << $family
      print_syscall = 1
    }
  }

/* mmap flags */
probe kernel.function("do_mmap at mm/mmap.c").return
  {
    if (filter_p()) next;

    if (($return >= 0 || $return < -1000) && ($flags & (2 | 4)) == (2 | 4)) { # PROT_WRITE | PROT_EXEC
      no_memory_deny_write_execute = 1
      print_syscall = 1
    }
  }

/* path checks */
probe kernel.function("security_path_mknod at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($dir)]++
      written_paths[fullpath_struct_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_mkdir at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($dir)]++
      written_paths[fullpath_struct_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_rmdir at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($dir)]++
      written_paths[fullpath_struct_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_unlink at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($dir)]++
      written_paths[fullpath_struct_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_symlink at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($dir)]++
      written_paths[fullpath_struct_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_link at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($new_dir)]++
      written_paths[fullpath_struct_path($new_dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_rename at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($old_dir)]++
      written_paths[fullpath_struct_path($old_dir)]++
      accessed_paths[fullpath_struct_path($new_dir)]++
      written_paths[fullpath_struct_path($new_dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_truncate at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($path)]++
      written_paths[fullpath_struct_path($path)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_chmod at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($path)]++
      written_paths[fullpath_struct_path($path)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_chown at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($path)]++
      written_paths[fullpath_struct_path($path)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_path_chroot at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($path)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_create at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($dir)]++
      written_paths[inode_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_link at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($dir)]++
      written_paths[inode_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_unlink at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($dir)]++
      written_paths[inode_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_symlink at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($dir)]++
      written_paths[inode_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_mkdir at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($dir)]++
      written_paths[inode_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_rmdir at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($dir)]++
      written_paths[inode_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_mknod at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($dir)]++
      written_paths[inode_path($dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_rename at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($old_dir)]++
      written_paths[inode_path($old_dir)]++
      accessed_paths[inode_path($new_dir)]++
      written_paths[inode_path($new_dir)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_readlink at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $dentry > 1000) {
      printf("func %s dentry 0x%x\n", pp(), $dentry);
      accessed_paths[d_path($dentry)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_follow_link at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $dentry > 1000) {
      printf("func %s dentry 0x%x inode 0x%x\n", pp(), $dentry, $inode);
      accessed_paths[inode_path($inode)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_permission at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($inode)]++
      if ($mask & (0x00000002 | 0x00000008)) # MAY_WRITE | MAY_APPEND
        written_paths[inode_path($inode)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_setattr at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $dentry > 1000) {
      printf("func %s dentry 0x%x\n", pp(), $dentry);
      accessed_paths[d_path($dentry)]++
      written_paths[d_path($dentry)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_getattr at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_path($path)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_setxattr at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $dentry > 1000) {
      printf("func %s dentry 0x%x\n", pp(), $dentry);
      accessed_paths[d_path($dentry)]++
      written_paths[d_path($dentry)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_getxattr at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $dentry > 1000) {
      printf("func %s dentry 0x%x\n", pp(), $dentry);
      accessed_paths[d_path($dentry)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_removexattr at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $dentry > 1000) {
      printf("func %s dentry 0x%x\n", pp(), $dentry);
      accessed_paths[d_path($dentry)]++
      written_paths[d_path($dentry)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_getsecurity at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($inode)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_setsecurity at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($inode)]++
      written_paths[inode_path($inode)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_listsecurity at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($inode)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_getsecid at security/security.c").return
  {
    if (filter_p()) next;

    accessed_paths[inode_path($inode)]++
    print_syscall = 1
  }

probe kernel.function("security_file_permission at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_file(task_current(), $file)]++
      if ($mask & (0x00000002 | 0x00000008)) # MAY_WRITE | MAY_APPEND
        written_paths[fullpath_struct_file(task_current(), $file)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_file_set_fowner at security/security.c").return
  {
    if (filter_p()) next;

    accessed_paths[fullpath_struct_file(task_current(), $file)]++
    written_paths[fullpath_struct_file(task_current(), $file)]++
    print_syscall = 1
  }

probe kernel.function("security_file_open at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[fullpath_struct_file(task_current(), $file)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_setsecctx at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $dentry > 1000) {
      printf("func %s dentry 0x%x\n", pp(), $dentry);
      accessed_paths[d_path($dentry)]++
      written_paths[d_path($dentry)]++
      print_syscall = 1
    }
  }

probe kernel.function("security_inode_getsecctx at security/security.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      accessed_paths[inode_path($inode)]++
      print_syscall = 1
    }
  }

/* system call printing */
probe nd_syscall.* 
  {
    # TODO: filter out apparently-nested syscalls (that are implemented
    # in terms of each other within the kernel); PR6762

    if (filter_p()) next;

    used_syscalls[name]++

    thread_argstr[tid()]=argstr
    if (timestamp || elapsed_time)
      thread_time[tid()]=gettimeofday_us()

    if (name in syscalls_nonreturn)
      report(name,argstr,"")
  }

probe nd_syscall.*.return
  {
    if (filter_p()) next;

    report(name,thread_argstr[tid()],retstr)
  }

function report(syscall_name, syscall_argstr, syscall_retstr)
  {
    if (timestamp || elapsed_time)
      {
        now = gettimeofday_us()
        then = thread_time[tid()]

        if (timestamp)
          prefix=sprintf("%s.%06d ", ctime(then/1000000), then%1000000)

        if (elapsed_time && (now>then)) {
          diff = now-then
          suffix=sprintf(" <%d.%06d>", diff/1000000, diff%1000000)
        }

        delete thread_time[tid()]
      }

    /* add a thread-id string in lots of cases, except if
       stap strace.stp -c SINGLE_THREADED_CMD */
    if (tid() != target()) {
      prefix .= sprintf("%s[%d] ", execname(), tid())
    }

    if (used_caps) {
       suffix .= " [Capabilities=" . caps_to_str(used_caps) . "]"
       all_used_caps |= used_caps
       print_syscall = 1
    }		       
    if (missing_caps) {
       suffix .= " missing [Capabilities=" . caps_to_str(missing_caps) . "]"
       all_missing_caps |= missing_caps
       print_syscall = 1
    }		       

    foreach ([type, dev] in accessed_devices) {
      devs .= dev_to_str(type, dev, accessed_devices[type, dev]) . " "
      if (has_devs == 0) {
        has_devs = 1
	print_syscall = 1
	devs = " [DeviceAllow=" . devs
      }
      all_accessed_devices[type, dev] = accessed_devices[type, dev];
    }
    if (has_devs) {
      devs .= "]"
      suffix .= devs
    }

    if (used_afs) {
      suffix .= " [RestrictAddressFamilies=" . afs_to_str(used_afs) . "]"
      all_used_afs |= used_afs
      print_syscall = 1
    }		       
    if (missing_afs) {
      suffix .= " missing [RestrictAddressFamilies=" . afs_to_str(missing_afs) . "]"
      all_missing_afs |= missing_afs
      print_syscall = 1
    }		       

    if (no_memory_deny_write_execute) {
      suffix .= " [MemoryDenyWriteExecute=false]"
      all_memory_deny_write_execute = "false"
    }		       

    if (highwatermark_fsize > old_highwatermark_fsize) {
      suffix .= sprintf(" [FSIZE %d -> %d]", old_highwatermark_fsize, highwatermark_fsize)
      old_highwatermark_fsize = highwatermark_fsize
    }
    if (highwatermark_data > old_highwatermark_data) {
      suffix .= sprintf(" [DATA %d -> %d]", old_highwatermark_data, highwatermark_data)
      old_highwatermark_data = highwatermark_data
    }
    if (highwatermark_stack > old_highwatermark_stack) {
      suffix .= sprintf(" [STACK %d -> %d]", old_highwatermark_stack, highwatermark_stack)
      old_highwatermark_stack = highwatermark_stack
    }
    if (highwatermark_core > old_highwatermark_core) {
      suffix .= sprintf(" [CORE %d -> %d]", old_highwatermark_core, highwatermark_core)
      old_highwatermark_core = highwatermark_core
    }
    if (highwatermark_nofile > old_highwatermark_nofile) {
      suffix .= sprintf(" [NOFILE %d -> %d]", old_highwatermark_nofile, highwatermark_nofile)
      old_highwatermark_nofile = highwatermark_nofile
    }
    if (highwatermark_as > old_highwatermark_as) {
      suffix .= sprintf(" [AS %d -> %d]", old_highwatermark_as, highwatermark_as)
      old_highwatermark_as = highwatermark_as
    }
    if (highwatermark_nproc > old_highwatermark_nproc) {
      suffix .= sprintf(" [NPROC %d -> %d]", old_highwatermark_nproc, highwatermark_nproc)
      old_highwatermark_nproc = highwatermark_nproc
    }
    if (highwatermark_memlock > old_highwatermark_memlock) {
      suffix .= sprintf(" [MEMLOCK %d -> %d]", old_highwatermark_memlock, highwatermark_memlock)
      old_highwatermark_memlock = highwatermark_memlock
    }
    if (highwatermark_sigpending > old_highwatermark_sigpending) {
      suffix .= sprintf(" [SIGPENDING %d -> %d]", old_highwatermark_sigpending, highwatermark_sigpending)
      old_highwatermark_sigpending = highwatermark_sigpending
    }
    if (highwatermark_msgqueue > old_highwatermark_msgqueue) {
      suffix .= sprintf(" [MSGQUEUE %d -> %d]", old_highwatermark_msgqueue, highwatermark_msgqueue)
      old_highwatermark_msgqueue = highwatermark_msgqueue
    }
    if (highwatermark_nice > old_highwatermark_nice) {
      suffix .= sprintf(" [NICE %d -> %d]", old_highwatermark_nice, highwatermark_nice)
      old_highwatermark_nice = highwatermark_nice
    }
    if (highwatermark_rtprio > old_highwatermark_rtprio) {
      suffix .= sprintf(" [RTPRIO %d -> %d]", old_highwatermark_rtprio, highwatermark_rtprio)
      old_highwatermark_rtprio = highwatermark_rtprio
    }
    
    foreach ([path+] in written_paths) {
      if (has_dirs == 0) {
	has_dirs = 1
	print_syscall = 1
	dirs = " [ReadWriteDirectories="
      }
      dirs .= path . " "
      all_written_paths[path]++
      if (protect_system == "full" && path == "/etc") {
        protect_system = "true"
	suffix .= " [ProtectSystem=true]"
      } else if (protect_system != "false" && path in protect_system_paths) {
        protect_system = "false"
	suffix .= " [ProtectSystem=false]"
      }
      if (protect_home != "false" && path in protect_home_paths) {
        protect_home = "false"
	suffix .= " [ProtectHome=false]"
      }
    }
    if (has_dirs) {
      dirs .= "]"
      suffix .= dirs
    }

    has_dirs = 0
    foreach ([path+] in accessed_paths) {
      if (has_dirs == 0) {
	has_dirs = 1
	print_syscall = 1
	dirs = " [InaccessibleDirectories=~"
      }
      dirs .= path . " "
      all_accessed_paths[path]++
      if (protect_home == "true" && path in protect_home_paths) {
        protect_home = "read-only"
	suffix .= " [ProtectHome=read-only]"
      }
    }
    if (has_dirs) {
      dirs .= "]"
      suffix .= dirs
    }

    if (!only_capability_use || print_syscall)
        printf("%s%s(%s) = %s%s\n",
             prefix, 
             syscall_name, syscall_argstr, syscall_retstr,
	     suffix)

    used_caps = 0
    missing_caps = 0
    used_afs = 0
    print_syscall = 0
    no_memory_deny_write_execute = 0
    delete accessed_devices
    delete accessed_paths
    delete written_paths

    delete thread_argstr[tid()]
  }

probe end
  {
    printf("\nSummary:\n")
    printf("CapabilityBoundingSet=%s\n", caps_to_str(all_used_caps))
    if (all_missing_caps)
	    printf("# Consider also possibly missing CapabilityBoundingSet=%s\n", caps_to_str(all_missing_caps))
    printf("ProtectHome=%s\n", protect_home)
    printf("ProtectSystem=%s\n", protect_system)
    # No way to analyze if PrivateTmp could be used
    printf("DevicePolicy=strict\n")
    foreach ([type, dev+] in all_accessed_devices)
      printf("DeviceAllow=%s\n", dev_to_str(type, dev, all_accessed_devices[type, dev]))
    printf("# LimitFSIZE=%d\n", highwatermark_fsize)
    printf("# LimitDATA=%d\n", highwatermark_data)
    printf("# LimitSTACK=%d\n", highwatermark_stack)
    printf("# LimitCORE=%d\n", highwatermark_core)
    printf("# LimitNOFILE=%d\n", highwatermark_nofile)
    printf("# LimitAS=%d\n", highwatermark_as)
    printf("# LimitNPROC=%d\n", highwatermark_nproc)
    printf("# LimitMEMLOCK=%d\n", highwatermark_memlock)
    printf("# LimitSIGPENDING=%d\n", highwatermark_sigpending)
    printf("# LimitMSGQUEUE=%d\n", highwatermark_msgqueue)
    printf("# LimitNICE=%d\n", highwatermark_nice)
    printf("# LimitRTPRIO=%d\n", highwatermark_rtprio)
    printf("RestrictAddressFamilies=%s\n", afs_to_str(all_used_afs))
    if (all_missing_afs)
	    printf("# Consider also possibly missing RestrictAddressFamilies=%s\n", afs_to_str(all_missing_afs))
    printf("MemoryDenyWriteExecute=%s\n", all_memory_deny_write_execute)
    printf("SystemCallFilter=")
    foreach ([syscall+] in used_syscalls)
      if (syscall in syscalls_for_seccomp)
        printf("%s ", syscalls_for_seccomp[syscall])
      else
        printf("%s ", syscall)

    foreach ([path] in all_accessed_paths)
      if (path in inaccessibles)
        inaccessibles[path] = 1

    foreach ([path+] in inaccessibles)
      if (inaccessibles[path] == 0)
        printf("\nInaccessibleDirectories=-%s", path)

    printf("\nReadOnlyDirectories=/\nReadWriteDirectories=")
    foreach ([path+] in all_written_paths)
      printf("%s ", path)
    printf("\n")
  }


More information about the systemd-devel mailing list