[systemd-devel] [PATCH] systemd: add RDTCacheReservation= option to support CAT (Cache Allocation Technology)
Marcelo Tosatti
mtosatti at redhat.com
Mon Jan 9 11:10:46 UTC 2017
On Fri, Jan 06, 2017 at 07:51:05PM +0100, Lennart Poettering wrote:
> On Fri, 06.01.17 11:59, Marcelo Tosatti (mtosatti at redhat.com) wrote:
>
> > Cache Allocation Technology is a feature on selected recent Intel Xeon
> > processors which allows control over L3 cache allocation.
>
> What precisely is the benefit of making this configurable? Can you
> describe a basic usecase why I'd want to set this on some service?
http://www.intel.com/content/www/us/en/communications/cache-allocation-technology-white-paper.html
"On systems with multiple workloads running simultaneously, which
heavily utilize the cache; system administrators can utilize CAT to
improve performance. CAT can be used to better utilize the caches and
as a result some applications are more likely to hit the cache which
results in fewer accesses to system memory. This results in lower
latency and greater performance."
--------
For example, with a job with large memory reaccess distances (where
caching hardly brings any benefit), you can limit that job to reclaim a
small portion of L3 cache (so you don't throw away data from other jobs
from L3 cache).
>
> > Index: systemd/man/systemd.exec.xml
> > ===================================================================
> > --- systemd.orig/man/systemd.exec.xml
> > +++ systemd/man/systemd.exec.xml
>
> Hmm, this doesn't look like a git patch? We generally prefer
> submissions as github PRs these days, btw.
>
> > @@ -233,6 +233,31 @@
> > </varlistentry>
> >
> > <varlistentry>
> > + <term><varname>RDTCacheReservation=</varname></term>
> > +
> > + <listitem><para>Creates a L3 CAT reservation in resctrlfs
> > + for the executed processes. Takes a ";" separated list of
> > + tuples (optionally triples) containing type,size and
> > + optionally cacheid:
>
> if I read this, then I am puzzled what "RDT Cache" is, what "L3 CAT"
> is. Can you explain?
RDT: Resource Director Technology(RDT)
CAT: Cache Allocation Technology (CAT), a subcomponent of RDT
(which includes other things such as CMT: Cache Monitoring Technology).
>
> > + type=type,size=size,cacheid=id;
> > + type=type,size=size,cacheid=id;...
> > +
> > + Where:
> > +
> > + type is one of: both, data, code.
> > + size is size in kbytes.
> > + cacheid (optional) is the cacheid for
> > + the reservation.
>
> what is a "cacheid"?
Cache IDs
---------
On current generation systems there is one L3 cache per socket and L2
caches are generally just shared by the hyperthreads on a core, but this
isn't an architectural requirement. We could have multiple separate L3
caches on a socket, multiple cores could share an L2 cache. So instead
of using "socket" or "core" to define the set of logical cpus sharing
a resource we use a "Cache ID". At a given cache level this will be a
unique number across the whole system (but it isn't guaranteed to be a
contiguous sequence, there may be gaps). To find the ID for each
logical
CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
>
> > +
> > + Rules for the parameters:
> > + * type=code must be paired with type=data entry.
> > +
> > + See the output of resctrltool for more details.
> > +
> > + </para></listitem>
> > + </varlistentry>
>
> This syntax doesn't appear to be in line with how we'd do this in
> systemd usually. Ideally, we try to hide the fact that our settings
> syntax is one-dimensional as much as we can. Hence, a syntax like this
> sounds more appropriate to me:
>
> RDTCacheReservation=4K code
> RDTCacheReservation=16K data
>
> and so on...
What you suggest is to essentially to split
RDTCacheReservation=type=type,size=size,cacheid=id;type=type,size=size,cacheid=id
Into
RDTCacheReservation=type=type,size=size,cacheid=id
RDTCacheReservation=type=type,size=size,cacheid=id
(yes sure can get rid of the commas as well, looks better).
> But then again, I have no idea what all of this actually really is,
> hence I am not sure my proposed syntax makes much sense.
>
> sizes are generally parsed with parse_bytes() in systemd, so that K,
> M, G suffixes are properly recognized...
>
> > + ExecContext *c = userdata;
> > + int r;
> > + int i;
> > +
> > + assert(bus);
> > + assert(reply);
> > + assert(c);
> > +
> > + r = sd_bus_message_open_container(reply, 'a', "s");
> > + if (r < 0)
> > + return r;
> > +
> > + for (i = 0; i < c->argmidx; i++) {
> > + r = sd_bus_message_append(reply, "s", c->argm[i]);
> > + if (r < 0)
> > + return r;
> > + }
>
> indentation is borked. Please indent to multiples of 8ch.
>
> > +static char* convert_rdt_back(char *argm[], int argmidx)
> > +{
>
> Please follow coding style, place opening bracket on same name as
> funciton name.
>
> > + int i, ret;
> > + char *buf, *str;
> > + int size = 0;
> > + int printcomma = 0;
> > + int printsemicolon = 0;
> > + int localmode = 0;
>
> Please use C99 "bool" when you actually mean a boolean.
>
> > +
> > + for (i=0; i < argmidx; i++) {
> > + /* ',', ';', '=' */
> > + size += strlen(argm[i]) + 3;
> > + }
> > +
> > + buf = malloc(size);
> > + if (buf == NULL)
> > + return NULL;
> > + memset(buf, 0, size);
>
> Use "buf = new0(char, size)" for allocations of this kind.
>
> > +
> > + str = buf;
> > +
> > + /* cache-id specified */
> > + for (i=0; i < argmidx; i++) {
> > + if (strlen(argm[i]) == 10) {
> > + if (strcmp(argm[i], "--cache-id") == 0)
> > + localmode = 1;
> > + }
> > + }
>
> Please use "streq()" instead of strcmp() to compare strings for equality.
>
> > +
> > + for (i=0; i<argmidx; i++) {
> > + char *cur = argm[i];
> > +
> > + if (strlen(cur) == 0)
> > + return buf;
> > +
> > + if (strlen(cur) == 6) {
> > + if (strcmp(cur, "--type") == 0) {
> > + ret = sprintf(str, "type=");
> > + str = str + ret;
> > + printcomma = 1;
> > + continue;
> > + }
> > +
> > + if (strcmp(cur, "--size") == 0) {
> > + ret = sprintf(str, "size=");
> > + str = str + ret;
> > + if (localmode == 1)
> > + printcomma = 1;
> > + else
> > + printsemicolon = 1;
> > +
> > + continue;
> > + }
> > + }
> > +
> > + if (strlen(cur) == 10) {
> > + if (strcmp(cur, "--cache-id") == 0) {
> > + ret = sprintf(str, "cache-id=");
> > + str = str + ret;
> > + printsemicolon = 1;
> > + continue;
> > + }
> > + }
> > +
> > + ret = sprintf(str, "%s", cur);
> > + str = str + ret;
> > +
> > + if (i != argmidx - 1) {
> > + if (printsemicolon) {
> > + ret = sprintf(str, ";");
> > + str = str + 1;
> > + } else if (printcomma) {
> > + ret = sprintf(str, ",");
> > + str = str + 1;
> > + }
> > +
> > + printcomma = 0;
> > + printsemicolon = 0;
> > + }
> > + }
> > +
> > + return buf;
> > +}
> > +
> > int bus_exec_context_set_transient_property(
> > Unit *u,
> > ExecContext *c,
> > @@ -1377,6 +1492,46 @@ int bus_exec_context_set_transient_prope
> > }
> >
> > return 1;
> > + } else if (streq(name, "RDTCacheReservation")) {
> > +
> > + r = sd_bus_message_enter_container(message, 'a', "s");
> > + if (r < 0)
> > + return r;
> > +
> > + while ((r = sd_bus_message_enter_container(message, 'r', "s")) > 0) {
> > + const char *str;
> > +
> > + r = sd_bus_message_read(message, "s", &str);
> > + if (r < 0)
> > + return r;
> > +
> > + if (mode != UNIT_CHECK)
> > + c->argm[c->argmidx++] = strdup(str);
>
> Indentation borked. Missing OOM check.
>
> > +
> > + r = sd_bus_message_exit_container(message);
> > + if (r < 0)
> > + return r;
> > + }
> > +
> > + if (mode != UNIT_CHECK) {
> > + char *conv;
> > +
> > + conv = convert_rdt_back(c->argm, c->argmidx);
> > + if (conv == NULL) {
> > + return sd_bus_error_setf(error,
> > + SD_BUS_ERROR_NO_MEMORY,
> > + "out of memory for rdt string
> > convertion");
>
> you may return -ENOMEM here, the right thing will happen.
>
> > + }
> > +
> > + unit_write_drop_in_private_format(u, mode, name, "RDTCacheReservation=%s", conv);
> > + free(conv);
> > + }
>
> indentation really borked.
>
>
> > +static int fork_exec_resctrltool(Unit *u, char *argv[])
> > +{
>
> coding style... opening bracket on same name as function.
>
> > + int outpipe[2];
> > + int errpipe[2];
> > + pid_t cpid;
> > +
> > + pipe(outpipe);
> > + pipe(errpipe);
>
> missing error checks.
>
> > + cpid = fork();
>
> Umpff... This is not how we work in systemd. We do not glue things
> together form the various system tools we shell out to. Sorry, but
> this isn't OK to add this way.
Fine, will write a C library.
> If access to the reservation file system is straight-forward, please
> add some glue code for it to src/basic/ or so, so that we can do the
> accesses directly. Or, alternatively, prepare a proper C library that
> we can link against, maintained as part of resctrltool.
>
> Sorry, but this is a total blocker. Either we do the fs access
> oursvles, or we call into a properly designed C library we link
> against, but shelling out external tools is not an option.
>
> > + if(cpid == 0) {
> > + int r;
> > +
> > + dup2(errpipe[1], STDERR_FILENO);
> > + dup2(outpipe[1], STDOUT_FILENO);
>
> error checking...
>
> > +
> > + r = execve(argv[0], argv, NULL);
> > + if (r == -1) {
> > + log_open();
> > + log_unit_error_errno(u, r, "Failed to execve resctrltool, ignoring: %m");
> > + log_close();
> > + return -errno;
> > + }
> > + } else {
> > + char buffer[100];
>
> arbitrarily sized buffers... are urks. please use LINE_MAX or so at least....
>
> > + fd_set fdset;
> > + int count;
> > + int ret;
> > + int nfds;
> > + int status = 0;
> > + int retst;
> > +
> > + ret = waitpid(cpid, &status, 0);
> > + if (ret == -1) {
> > + log_open();
> > + log_unit_error_errno(u, ret, "Failed to waitpid resctrltool, ignoring: %m");
> > + log_close();
> > + return -errno;
> > + }
> > +
> > + if (!WIFEXITED(status)) {
> > + log_open();
> > + log_unit_error_errno(u, ret, "resctrltool exits abnormally, ignoring: %m");
> > + log_close();
> > + return -1;
> > + } else {
> > + retst = WEXITSTATUS(status);
> > +
> > + if (retst == 0) {
> > + return 0;
> > + }
> > + }
> > +
> > + /*
> > + * error, read stderr and stdout and log.
> > + */
> > +
> > + FD_ZERO(&fdset);
> > + FD_SET(outpipe[0], &fdset);
> > + FD_SET(errpipe[0], &fdset);
> > +
> > + if (outpipe[0] > errpipe[0])
> > + nfds = outpipe[0] + 1;
> > + else
> > + nfds = errpipe[0] + 1;
> > +
> > + ret = select(nfds, &fdset, NULL, NULL, NULL);
>
> select() is not an option. We regularly have to deal with fds > 1024,
> and select() doesn't support those. it's not just slow, it simply
> won't work.
>
> Really, select() is obsolete IRL.
>
> > + if (ret == -1) {
> > + log_open();
> > + log_unit_error_errno(u, ret, "select error, ignoring RDTCacheReservation: %m");
> > + log_close();
> > + return -1;
> > + }
> > +
> > + if (FD_ISSET(outpipe[0], &fdset)) {
> > + count = read(outpipe[0], buffer, sizeof(buffer)-1);
> > + if (count >= 0) {
> > + buffer[count] = 0;
> > + log_open();
> > + log_unit_error(u, "resctrltool stdout: %s", buffer);
> > + log_close();
> > + } else {
> > + log_open();
> > + log_unit_error(u, "resctrltool io error\n");
> > + log_close();
> > + }
> > + }
> > +
> > + if (FD_ISSET(errpipe[0], &fdset)) {
> > + count = read(errpipe[0], buffer, sizeof(buffer)-1);
> > + if (count >= 0) {
> > + buffer[count] = 0;
> > + log_open();
> > + log_unit_error(u, "resctrltool stderr: %s", buffer);
> > + log_close();
> > + } else {
> > + log_open();
> > + log_unit_error(u, "resctrltool io error\n");
> > + log_close();
> > + }
> > + }
> > +
> > + return retst;
>
>
> This will deadlock if the tool prints more than PIPE_BUF output...
>
> > + }
> > +
> > + return -1;
>
> Please do not make up fake error codes such as "-1". We pretty
> universally follow kernel style "return -EFOOBAR".
>
> > +}
> > +
> > +static int setup_rdt_cat(Unit *u, const ExecContext *c, pid_t pid)
> > +{
> > + int ret, i;
> > + const char *arg[5];
> > + char pidstr[100];
>
> fixed size arrays... If you only write a pid into this, then use DECIMAL_STR_MAX(pid_t)...
>
> > + char *name = u->id;
> > + const char *largm[ARGMSIZE + 4];
> > +
> > + sprintf(pidstr, "%d", pid);
>
> Use PID_FMT for this...
>
> > +
> > + arg[0] = "/usr/bin/resctrltool";
> > + arg[1] = "delete";
> > + arg[2] = name;
> > + arg[3] = 0;
> > +
> > + ret = fork_exec_resctrltool(u, (char **)arg);
> > + /* we want to ignore 'reservation does not exist'
> > + * errors, so skip error checking entirely.
> > + * any other errors will be caught when trying
> > + * to execute create below
> > + */
> > +
> > + memset(largm, 0, sizeof(largm));
>
> Use memzero()...
>
> Please have a look at CODING_STYLE.
>
> And please: the shelling out is not acceptable. Sorry!
>
> Lennart
>
> --
> Lennart Poettering, Red Hat
More information about the systemd-devel
mailing list