[systemd-devel] [PATCH] systemd: add RDTCacheReservation= option to support CAT (Cache Allocation Technology)
Marcelo Tosatti
mtosatti at redhat.com
Fri Jan 6 16:13:36 UTC 2017
On Fri, Jan 06, 2017 at 01:51:11PM -0200, Marcelo Tosatti wrote:
> On Fri, Jan 06, 2017 at 05:26:36PM +0200, Mantas Mikulėnas wrote:
> > On Fri, Jan 6, 2017 at 3:59 PM, Marcelo Tosatti <mtosatti at redhat.com> wrote:
> >
> > >
> > >
> > > Cache Allocation Technology is a feature on selected recent Intel Xeon
> > > processors which allows control over L3 cache allocation.
> > >
> > > Kernel support has been merged to the upstream kernel, via a filesystem
> > > resctrlfs.
> > >
> > > On top of that, a userspace utility, resctrltool has been written
> > > to facilitate writing applications and using the filesystem
> > > interface (see the rationale at
> > > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1300792.html).
> > >
> > > This patch adds a new option to systemd, RDTCacheReservation,
> > > to allow configuration of CAT via resctrltool.
> > >
> > > See the first hunk of the patch for a description of the option
> >
> >
> > This really doesn't look pretty, neither the approach nor the
> > implementation...
>
> Suggestions to improve the code or the approach are welcome.
>
> > Is the option actually so complex that calling resctrltool is the only way
> > to adjust it? What about writing to the resctrlfs directly?
>
> You'll have to deal with the issues that resctrltool deals with,
> namely:
>
> 1) Filesystem locking.
> 2) Reading in every directory and the default
> directory.
> 3) Converting the reservation request to proper sizes.
> 4) Converting:
> type=both --> type=data/type=code
>
> type=data/type=code --> type=both
>
> 4) Finding free space for the reservation.
> 5) Adjusting the default group reservation.
>
> Since this steps must be performed by every user of
> CAT (including libvirt which plans to execute resctrltool
> as well), it was decided its better to maintain this logic
> in a centralized place.
>
> Why do you prefer writing to to resctrlfs directly?
>
> > Also, it would be nicer to submit the patches as GitHub pull requests
> > instead.
>
> Sure can do that after the reviews... Thanks for your comments!
>
> >
> > >
> > > +static char* convert_rdt_back(char *argm[], int argmidx)
> > > +{
> > > + int i, ret;
> > > + char *buf, *str;
> > > + int size = 0;
> > > + int printcomma = 0;
> > > + int printsemicolon = 0;
> > > + int localmode = 0;
> > > +
> > > + for (i=0; i < argmidx; i++) {
> > > + /* ',', ';', '=' */
> > > + size += strlen(argm[i]) + 3;
> > > + }
> > > +
> > > + buf = malloc(size);
> > > + if (buf == NULL)
> > > + return NULL;
> > > + memset(buf, 0, size);
> > > +
> > > + str = buf;
> > > +
> > > + /* cache-id specified */
> > > + for (i=0; i < argmidx; i++) {
> > > + if (strlen(argm[i]) == 10) {
> > > + if (strcmp(argm[i], "--cache-id") == 0)
> > > + localmode = 1;
> > > + }
> > > + }
> > > +
> > > + for (i=0; i<argmidx; i++) {
> > > + char *cur = argm[i];
> > > +
> > > + if (strlen(cur) == 0)
> > > + return buf;
> > > +
> > > + if (strlen(cur) == 6) {
> > > + if (strcmp(cur, "--type") == 0) {
> > > + ret = sprintf(str, "type=");
> > > + str = str + ret;
> > > + printcomma = 1;
> > > + continue;
> > > + }
> > > + etc.
> > > + }
> > > +
> > > + return buf;
> > > +}
> > > +
> > >
> >
> > Converting from foo;bar to argv[] and back is a bit horrible.
>
> Its a very simple function.
>
> > It seems to
> > me that it would be *much* simpler and easier to maintain if you stored
> > only actual parameters in the unit (e.g. as an array of
> > {type,size,cache_id}) and only generated argv at the moment it was
> > necessary, i.e. right before spawning resctrltool. After all, that's the
> > only place you need it. *And* that way you wouldn't need the above function
> > at all.
>
> Honestly:
>
> An array of {type,size,cache_id}
>
> VS
>
> A list of:
>
> "--type" "type" "--size" size "--cache-id" id
>
> Is essentially the same thing to me.
>
> And you would have to convert back to the
>
> "type=type,size=size,cache-id=cacheid"
>
> format anyway.
>
> So i don't see any actual advantage of your suggestion
> (other than personal preference).
>
> But sure, i'll convert it to an array of
>
> {type,size,cache_id}.
Well, i take that back. Its the same data, just stored in
a different way. So truly this is personal preference and
not a technical argument.
Can you give me one technical advantage of storing the data
as {type,size,cache_id} ?
Disadvantages:
1) you have to parse "mb" "kb" "gb" (documentation
lacks to mention that).
2) you have to perform the following convertions:
type=type,size=size,cache-id=cache -->
array of [type,size,cacheid] -->
character list of
"--type" type "--size" size --cache-id "cache-id"
Whereas now you perform
type=type,size=size,cache-id=cache -->
character list of
"--type" type "--size" size --cache-id "cache-id"
So your suggestion actually complicates the code
and does not simplify it.
If the systemd committer disagrees, i'll rewrite.
Unrelated: can you please remind me as to where a memory
leak in the current parser is?
> > > Index: systemd/src/core/execute.c
> > > ===================================================================
> > > --- systemd.orig/src/core/execute.c
> > > +++ systemd/src/core/execute.c
> > > @@ -36,6 +36,10 @@
> > > #include <sys/un.h>
> > > #include <unistd.h>
> > > #include <utmpx.h>
> > > +#include <fcntl.h>
> > > +#include <sys/select.h>
> > > +#include <sys/types.h>
> > > +#include <sys/wait.h>
> > >
> > > #ifdef HAVE_PAM
> > > #include <security/pam_appl.h>
> > > @@ -2201,6 +2205,161 @@ static int apply_working_directory(const
> > > return 0;
> > > }
> > >
> > > +
> > > +static int fork_exec_resctrltool(Unit *u, char *argv[])
> > > +{
> > > + int outpipe[2];
> > > + int errpipe[2];
> > > + pid_t cpid;
> > > +
> > > + pipe(outpipe);
> > > + pipe(errpipe);
> > > + cpid = fork();
> > > +
> > > + if(cpid == 0) {
> > > + int r;
> > > +
> > > + dup2(errpipe[1], STDERR_FILENO);
> > > + dup2(outpipe[1], STDOUT_FILENO);
> > > +
> > > + r = execve(argv[0], argv, NULL);
> > > + if (r == -1) {
> > > + log_open();
> > > + log_unit_error_errno(u, r, "Failed to execve
> > > resctrltool, ignoring: %m");
> > > + log_close();
> > > + return -errno;
> > > + }
> > > + } else {
> > > + char buffer[100];
> > > + fd_set fdset;
> > > + int count;
> > > + int ret;
> > > + int nfds;
> > > + int status = 0;
> > > + int retst;
> > > +
> > > + ret = waitpid(cpid, &status, 0);
> > > + if (ret == -1) {
> > > + log_open();
> > > + log_unit_error_errno(u, ret, "Failed to waitpid
> > > resctrltool, ignoring: %m");
> > > + log_close();
> > > + return -errno;
> > > + }
> > > +
> > > + if (!WIFEXITED(status)) {
> > > + log_open();
> > > + log_unit_error_errno(u, ret, "resctrltool exits
> > > abnormally, ignoring: %m");
> > > + log_close();
> > > + return -1;
> > > + } else {
> > > + retst = WEXITSTATUS(status);
> > > +
> > > + if (retst == 0) {
> > > + return 0;
> > > + }
> > > + }
> > > +
> > > + /*
> > > + * error, read stderr and stdout and log.
> > > + */
> > > +
> > > + FD_ZERO(&fdset);
> > > + FD_SET(outpipe[0], &fdset);
> > > + FD_SET(errpipe[0], &fdset);
> > > +
> > > + if (outpipe[0] > errpipe[0])
> > > + nfds = outpipe[0] + 1;
> > > + else
> > > + nfds = errpipe[0] + 1;
> > > +
> > > + ret = select(nfds, &fdset, NULL, NULL, NULL);
> > > + if (ret == -1) {
> > > + log_open();
> > > + log_unit_error_errno(u, ret, "select error,
> > > ignoring RDTCacheReservation: %m");
> > > + log_close();
> > > + return -1;
> > > + }
> > > +
> > > + if (FD_ISSET(outpipe[0], &fdset)) {
> > > + count = read(outpipe[0], buffer,
> > > sizeof(buffer)-1);
> > > + if (count >= 0) {
> > > + buffer[count] = 0;
> > > + log_open();
> > > + log_unit_error(u, "resctrltool stdout:
> > > %s", buffer);
> > > + log_close();
> > > + } else {
> > > + log_open();
> > > + log_unit_error(u, "resctrltool io
> > > error\n");
> > > + log_close();
> > > + }
> > > + }
> > > +
> > > + if (FD_ISSET(errpipe[0], &fdset)) {
> > > + count = read(errpipe[0], buffer,
> > > sizeof(buffer)-1);
> > > + if (count >= 0) {
> > > + buffer[count] = 0;
> > > + log_open();
> > > + log_unit_error(u, "resctrltool stderr:
> > > %s", buffer);
> > > + log_close();
> > > + } else {
> > > + log_open();
> > > + log_unit_error(u, "resctrltool io
> > > error\n");
> > > + log_close();
> > > + }
> > > + }
> > > +
> > > + return retst;
> > > + }
> > > +
> > > + return -1;
> > > +}
> > > +
> > >
> >
> > It rather looks to me like your pipes are leaking?
>
> Yes, will close them.
>
> >
> >
> > > +static int setup_rdt_cat(Unit *u, const ExecContext *c, pid_t pid)
> > > +{
> > > + int ret, i;
> > > + const char *arg[5];
> > > + char pidstr[100];
> > > + char *name = u->id;
> > > + const char *largm[ARGMSIZE + 4];
> > >
> >
> > What's the difference between arg and largm?
>
> None, can unify them.
>
> > > +
> > > + sprintf(pidstr, "%d", pid);
> > >
> >
> > Process IDs are unsigned.
>
> Fixed.
>
> > > + arg[0] = "/usr/bin/resctrltool";
> > > + arg[1] = "delete";
> > > + arg[2] = name;
> > > + arg[3] = 0;
> > > +
> > > + ret = fork_exec_resctrltool(u, (char **)arg);
> > > + /* we want to ignore 'reservation does not exist'
> > > + * errors, so skip error checking entirely.
> > > + * any other errors will be caught when trying
> > > + * to execute create below
> > > + */
> > > +
> > > + memset(largm, 0, sizeof(largm));
> > > + largm[0] = "/usr/bin/resctrltool";
> > > + largm[1] = "create";
> > > + largm[2] = "--name";
> > > + largm[3] = name;
> > > + for (i = 0; i < c->argmidx; i++)
> > > + largm[i+4] = c->argm[i];
> > > +
> > > + ret = fork_exec_resctrltool(u, (char **)largm);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + arg[0] = "/usr/bin/resctrltool";
> > > + arg[1] = "assign";
> > > + arg[2] = pidstr;
> > > + arg[3] = name;
> > > + arg[4] = 0;
> > > + ret = fork_exec_resctrltool(u, (char **)arg);
> > >
> >
> > Three calls per process seems excessive; couldn't resctrltool have a
> > dedicated subcommand for delete+create+assign?
>
> Hum, i think these are separate commands... its awkward
> to create a "franken"-command to speed up execution.
>
> Moreover i doubt this is a performance issue...
>
> > > +#define STATE_NEXT_TYPE 0
> > > +#define STATE_NEXT_SIZE 1
> > > +#define STATE_NEXT_CACHEID_OR_END 2
> > > +
> > > +static void free_argm(ExecContext *c)
> > > +{
> > > + int i;
> > > +
> > > + for (i=0; i < c->argmidx; i++) {
> > > + free(c->argm[i]);
> > > + c->argm[i] = NULL;
> > > + }
> > > + c->argmidx = 0;
> > > +}
> > > +
> > > +int config_parse_rdt_cache_reservation(const char* unit,
> > > + const char *filename,
> > > + unsigned line,
> > > + const char *section,
> > > + unsigned section_line,
> > > + const char *lvalue,
> > > + int ltype,
> > > + const char *rvalue,
> > > + void *data,
> > > + void *userdata) {
> > > +
> > > + ExecContext *c = data;
> > > + int curstate = STATE_NEXT_TYPE;
> > > + char *buf = (char *)rvalue;
> > > +
> > > + assert(filename);
> > > + assert(lvalue);
> > > + assert(rvalue);
> > > + assert(data);
> > > +
> > > + if (isempty(rvalue)) {
> > > + /* An empty assignment resets the list */
> > > + free_argm(c);
> > > + return 0;
> > > + }
> > > +
> > > + do {
> > > + if (c->argmidx > ARGMSIZE - 3) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, ENOMEM,
> > > + "argmidx overflow, ignoring: %s", rvalue);
> > > + return 0;
> > > + }
> > > +
> > > + switch (curstate) {
> > > + case STATE_NEXT_TYPE: {
> > > + char *string, *buf2;
> > > +
> > > + buf = strstr(buf, "type=");
> > > + if (buf == NULL) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, EINVAL,
> > > + "type= not found when expected, ignoring: %s",
> > > rvalue);
> > > + return 0;
> > > + }
> > > +
> > > + buf2 = strstr(buf, ",");
> > > + if (buf2 == NULL) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, EINVAL,
> > > + ", not found when parsing type, ignoring: %s",
> > > rvalue);
> > > + return 0;
> > > + }
> > > + if (strlen(buf) <= 5) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, EINVAL,
> > > + "invalid type, ignoring: %s", rvalue);
> > > + return 0;
> > > + }
> > > + /* skip type= */
> > > + buf = buf + 5;
> > > + if (strlen(buf) < 4) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, EINVAL,
> > > + "invalid type, ignoring: %s", rvalue);
> > > + return 0;
> > > + }
> > > + string = strndup(buf, 4);
> > > + if (string == NULL) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, ENOMEM,
> > > + "enomem for strndup, ignoring: %s", rvalue);
> > > + return 0;
> > > + }
> > > +
> > > + c->argm[c->argmidx++] = strdup("--type");
> > > + c->argm[c->argmidx++] = string;
> > > +
> > > + /* skip {code,data,both}, */
> > > + buf = buf + 5;
> > > + curstate = STATE_NEXT_SIZE;
> > > + break;
> > > + }
> > > + case STATE_NEXT_SIZE: {
> > > + char *string, *buf2, *commabuf, *dotcommabuf;
> > > + int len, skip = 0;
> > > +
> > > + buf = strstr(buf, "size=");
> > > + if (buf == NULL) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, EINVAL,
> > > + "size= not found when expected, ignoring: %s",
> > > rvalue);
> > > + return 0;
> > > + }
> > > + if (strlen(buf) <= 5) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, EINVAL,
> > > + "invalid size, ignoring: %s", rvalue);
> > > + return 0;
> > > + }
> > > + /* skip size= */
> > > + buf = buf + 5;
> > > +
> > > + commabuf = strstr(buf, ",");
> > > + dotcommabuf = strstr(buf, ";");
> > > +
> > > + buf2 = NULL;
> > > +
> > > + /* the end, neither , or ; follow */
> > > + if (commabuf == NULL && dotcommabuf == NULL) {
> > > + buf2 = buf + strlen(buf);
> > > + skip = 0;
> > > + /* only ',': cache-id must follow */
> > > + } else if (commabuf && dotcommabuf == NULL) {
> > > + buf2 = commabuf;
> > > + skip = 1;
> > > + /* trailing ';' at the end */
> > > + } else if (commabuf == NULL && dotcommabuf) {
> > > + buf2 = dotcommabuf;
> > > + skip = 1;
> > > + } else if (commabuf && dotcommabuf) {
> > > + if (commabuf > dotcommabuf)
> > > + buf2 = dotcommabuf;
> > > + else
> > > + buf2 = commabuf;
> > > + skip = 1;
> > > + }
> > > + len = buf2 - buf;
> > > + string = strndup(buf, len);
> > > + if (string == NULL) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, ENOMEM,
> > > + "enomem for strndup, ignoring: %s", rvalue);
> > > + return 0;
> > > + }
> > > +
> > > + c->argm[c->argmidx++] = strdup("--size");
> > > + c->argm[c->argmidx++] = string;
> > > + curstate = STATE_NEXT_CACHEID_OR_END;
> > > +
> > > + buf = buf2;
> > > + buf = buf + skip;
> > > + break;
> > > + }
> > > + case STATE_NEXT_CACHEID_OR_END: {
> > > + char *buf2, *string, *dotcommabuf;
> > > + int len, skip = 0;
> > > +
> > > + if (strlen(buf) < 4)
> > > + break;
> > > +
> > > + if (strncmp(buf, "type", 4) == 0) {
> > > + curstate = STATE_NEXT_TYPE;
> > > + break;
> > > + }
> > > +
> > > + buf = strstr(buf, "cache-id=");
> > > + if (buf == NULL) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, EINVAL,
> > > + "cache-id= not found when expected, ignoring:
> > > %s", rvalue);
> > > + return 0;
> > > + }
> > > + if (strlen(buf) <= 9) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, EINVAL,
> > > + "invalid cache-id=, ignoring: %s", rvalue);
> > > + return 0;
> > > + }
> > > + /* skip cache-id= */
> > > + buf = buf + 9;
> > > +
> > > + dotcommabuf = strstr(buf, ";");
> > > + /* the end */
> > > + if (dotcommabuf == NULL) {
> > > + buf2 = buf + strlen(buf);
> > > + skip = 0;
> > > + } else {
> > > + buf2 = dotcommabuf;
> > > + skip = 1;
> > > + }
> > > + len = buf2 - buf;
> > > + string = strndup(buf, len);
> > > + if (string == NULL) {
> > > + free_argm(c);
> > > + log_syntax(unit, LOG_ERR, filename, line, ENOMEM,
> > > + "enomem for strndup, ignoring: %s", rvalue);
> > > + return 0;
> > > + }
> > > +
> > > + c->argm[c->argmidx++] = strdup("--cache-id");
> > > + c->argm[c->argmidx++] = string;
> > > +
> > > + buf = buf + skip;
> > > + curstate = STATE_NEXT_TYPE;
> > > + break;
> > > + }
> > > + default:
> > > + break;
> > > + }
> > > + } while (strlen(buf) > 4);
> > > +
> > > + c->argm[c->argmidx] = 0;
> > > +
> > > + c->rdt_cache_reservation_set = true;
> > > +
> > > + return 0;
> > > +}
> > >
> >
> > This looks like it could be simplified *a lot* by using the strv functions,
> > e.g. strv_split(buf, ";") and just nicely iterating over each reservation.
I didnt know about strv_split... Will take a look at simplifying the
code with it.
More information about the systemd-devel
mailing list