[Intel-gfx] [PATCH v4] tools/intel_guc_logger: Utility for capturing GuC firmware logs in a file

Petri Latvala petri.latvala at intel.com
Tue Oct 25 09:01:58 UTC 2016


This patch is now pushed, thanks.


--
Petri Latvala


On Wed, Oct 12, 2016 at 02:32:17PM +0100, Tvrtko Ursulin wrote:
> 
> On 10/10/2016 15:31, Goel, Akash wrote:
> > 
> > 
> > On 10/10/2016 7:22 PM, Tvrtko Ursulin wrote:
> > > 
> > > On 10/10/2016 11:59, akash.goel at intel.com wrote:
> > > > From: Akash Goel <akash.goel at intel.com>
> > > > 
> > > > This patch provides a test utility which helps capture GuC firmware
> > > > logs and
> > > > then dump them to file.
> > > > The logs are pulled from a debugfs file
> > > > '/sys/kernel/debug/dri/guc_log' and
> > > > by default stored into a file 'guc_log_dump.dat'. The name,
> > > > including the
> > > > location, of the output file can be changed through a command line
> > > > argument.
> > > > 
> > > > The utility goes into an infinite loop where it waits for the arrival
> > > > of new
> > > > logs and as soon as new set of logs are produced it captures them in
> > > > its local
> > > > buffer which is then flushed out to the file on disk.
> > > > Any time when logging needs to be ended, User can stop this utility
> > > > (CTRL+C).
> > > > 
> > > > Before entering into a loop, it first discards whatever logs are
> > > > present in
> > > > the debugfs file.
> > > > This way User can first launch this utility and then start a
> > > > workload/activity
> > > > for which GuC firmware logs are to be actually captured and keep
> > > > running the
> > > > utility for as long as its needed, like once the workload is over this
> > > > utility
> > > > can be forcefully stopped.
> > > > 
> > > > If the logging wasn't enabled on GuC side by the Driver at boot time,
> > > > utility
> > > > will first enable the logging and later on when it is stopped (CTRL+C)
> > > > it will
> > > > also pause the logging on GuC side.
> > > > 
> > > > v2:
> > > > - Use combination of alarm system call & SIGALRM signal to run the
> > > > utility
> > > >    for required duration. (Tvrtko)
> > > > - Fix inconsistencies, do minor cleanup and refactoring. (Tvrtko)
> > > > 
> > > > v3:
> > > > - Fix discrepancy for the output file command line option and
> > > > update the
> > > >    Usage/help string.
> > > > 
> > > > v4:
> > > > - Update the exit condition for flusher thread, now will exit
> > > > only after
> > > >    the capture loop is over and not when the flag to stop
> > > > logging is set.
> > > >    This handles a corner case, due to which the dump of last captured
> > > > buffer
> > > >    was getting missed.
> > > > - Add a newline character at the end of assert messages.
> > > > - Avoid the assert for the case, which occurs very rarely, when there
> > > > are no
> > > >    bytes read from the relay file.
> > > > 
> > > > Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> > > > Signed-off-by: Akash Goel <akash.goel at intel.com>
> > > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com> (v3)
> > > > ---
> > > >   tools/Makefile.sources   |   1 +
> > > >   tools/intel_guc_logger.c | 438
> > > > +++++++++++++++++++++++++++++++++++++++++++++++
> > > >   2 files changed, 439 insertions(+)
> > > >   create mode 100644 tools/intel_guc_logger.c
> > > > 
> > > > diff --git a/tools/Makefile.sources b/tools/Makefile.sources
> > > > index 2bb6c8e..be58871 100644
> > > > --- a/tools/Makefile.sources
> > > > +++ b/tools/Makefile.sources
> > > > @@ -19,6 +19,7 @@ tools_prog_lists =        \
> > > >       intel_gpu_time        \
> > > >       intel_gpu_top        \
> > > >       intel_gtt        \
> > > > +    intel_guc_logger        \
> > > >       intel_infoframes    \
> > > >       intel_l3_parity        \
> > > >       intel_lid        \
> > > > diff --git a/tools/intel_guc_logger.c b/tools/intel_guc_logger.c
> > > > new file mode 100644
> > > > index 0000000..159a54e
> > > > --- /dev/null
> > > > +++ b/tools/intel_guc_logger.c
> > > > @@ -0,0 +1,438 @@
> > > > +
> > > > +#define _GNU_SOURCE  /* For using O_DIRECT */
> > > > +#include <inttypes.h>
> > > > +#include <stdio.h>
> > > > +#include <fcntl.h>
> > > > +#include <sched.h>
> > > > +#include <sys/ioctl.h>
> > > > +#include <string.h>
> > > > +#include <errno.h>
> > > > +#include <signal.h>
> > > > +#include <stdlib.h>
> > > > +#include <unistd.h>
> > > > +#include <sys/poll.h>
> > > > +#include <sys/wait.h>
> > > > +#include <sys/types.h>
> > > > +#include <sys/syscall.h>
> > > > +#include <sys/mman.h>
> > > > +#include <assert.h>
> > > > +#include <pthread.h>
> > > > +
> > > > +#include "igt.h"
> > > > +
> > > > +#define MB(x) ((uint64_t)(x) * 1024 * 1024)
> > > > +#ifndef PAGE_SIZE
> > > > +  #define PAGE_SIZE 4096
> > > > +#endif
> > > > +/* Currently the size of GuC log buffer is 19 pages & so is the size
> > > > of relay
> > > > + * subbuffer. If the size changes in future, then this define also
> > > > needs to be
> > > > + * updated accordingly.
> > > > + */
> > > > +#define SUBBUF_SIZE (19*PAGE_SIZE)
> > > > +/* Need large buffering from logger side to hide the DISK IO latency,
> > > > Driver
> > > > + * can only store 8 snapshots of GuC log buffer in relay.
> > > > + */
> > > > +#define NUM_SUBBUFS 100
> > > > +
> > > > +#define RELAY_FILE_NAME  "guc_log"
> > > > +#define DEFAULT_OUTPUT_FILE_NAME  "guc_log_dump.dat"
> > > > +#define CONTROL_FILE_NAME "i915_guc_log_control"
> > > > +
> > > > +char *read_buffer;
> > > > +char *out_filename;
> > > > +int poll_timeout = 2; /* by default 2ms timeout */
> > > > +pthread_mutex_t mutex;
> > > > +pthread_t flush_thread;
> > > > +int verbosity_level = 3; /* by default capture logs at max
> > > > verbosity */
> > > > +uint32_t produced, consumed;
> > > > +uint64_t total_bytes_written;
> > > > +int num_buffers = NUM_SUBBUFS;
> > > > +int relay_fd, outfile_fd = -1;
> > > > +uint32_t test_duration, max_filesize;
> > > > +pthread_cond_t underflow_cond, overflow_cond;
> > > > +bool stop_logging, discard_oldlogs, capturing_stopped;
> > > > +
> > > > +static void guc_log_control(bool enable_logging)
> > > > +{
> > > > +    int control_fd;
> > > > +    char data[19];
> > > > +    uint64_t val;
> > > > +    int ret;
> > > > +
> > > > +    control_fd = igt_debugfs_open(CONTROL_FILE_NAME, O_WRONLY);
> > > > +    igt_assert_f(control_fd >= 0, "couldn't open the guc log control
> > > > file\n");
> > > > +
> > > > +    val = enable_logging ? ((verbosity_level << 4) | 0x1) : 0;
> > > > +
> > > > +    ret = snprintf(data, sizeof(data), "0x%" PRIx64, val);
> > > > +    igt_assert(ret > 2 && ret < sizeof(data));
> > > > +
> > > > +    ret = write(control_fd, data, ret);
> > > > +    igt_assert_f(ret > 0, "couldn't write to the log control file\n");
> > > > +
> > > > +    close(control_fd);
> > > > +}
> > > > +
> > > > +static void int_sig_handler(int sig)
> > > > +{
> > > > +    igt_info("received signal %d\n", sig);
> > > > +
> > > > +    stop_logging = true;
> > > > +}
> > > > +
> > > > +static void pull_leftover_data(void)
> > > > +{
> > > > +    unsigned int bytes_read = 0;
> > > > +    int ret;
> > > > +
> > > > +    do {
> > > > +        /* Read the logs from relay buffer */
> > > > +        ret = read(relay_fd, read_buffer, SUBBUF_SIZE);
> > > > +        if (!ret)
> > > > +            break;
> > > > +
> > > > +        igt_assert_f(ret > 0, "failed to read from the guc log
> > > > file\n");
> > > > +        igt_assert_f(ret == SUBBUF_SIZE, "invalid read from relay
> > > > file\n");
> > > > +
> > > > +        bytes_read += ret;
> > > > +
> > > > +        if (outfile_fd >= 0) {
> > > > +            ret = write(outfile_fd, read_buffer, SUBBUF_SIZE);
> > > > +            igt_assert_f(ret == SUBBUF_SIZE, "couldn't dump the logs
> > > > in a file\n");
> > > > +            total_bytes_written += ret;
> > > > +        }
> > > > +    } while(1);
> > > > +
> > > > +    igt_debug("%u bytes flushed\n", bytes_read);
> > > > +}
> > > > +
> > > > +static int num_filled_bufs(void)
> > > > +{
> > > > +    return (produced - consumed);
> > > > +}
> > > > +
> > > > +static void pull_data(void)
> > > > +{
> > > > +    char *ptr;
> > > > +    int ret;
> > > > +
> > > > +    pthread_mutex_lock(&mutex);
> > > > +    while (num_filled_bufs() >= num_buffers) {
> > > > +        igt_debug("overflow, will wait, produced %u, consumed %u\n",
> > > > produced, consumed);
> > > > +        /* Stall the main thread in case of overflow, as there are no
> > > > +         * buffers available to store the new logs, otherwise there
> > > > +         * could be corruption if both threads work on the same
> > > > buffer.
> > > > +         */
> > > > +        pthread_cond_wait(&overflow_cond, &mutex);
> > > > +    };
> > > > +    pthread_mutex_unlock(&mutex);
> > > > +
> > > > +    ptr = read_buffer + (produced % num_buffers) * SUBBUF_SIZE;
> > > > +
> > > > +    /* Read the logs from relay buffer */
> > > > +    ret = read(relay_fd, ptr, SUBBUF_SIZE);
> > > > +    igt_assert_f(ret >= 0, "failed to read from the guc log file\n");
> > > > +    igt_assert_f(!ret || ret == SUBBUF_SIZE, "invalid read from relay
> > > > file\n");
> > > > +
> > > > +    if (ret) {
> > > > +        pthread_mutex_lock(&mutex);
> > > > +        produced++;
> > > > +        pthread_cond_signal(&underflow_cond);
> > > > +        pthread_mutex_unlock(&mutex);
> > > > +    } else {
> > > > +        /* Occasionally (very rare) read from the relay file
> > > > returns no
> > > > +         * data, albeit the polling done prior to read call indicated
> > > > +         * availability of data.
> > > > +         */
> > > > +        igt_debug("no data read from the relay file\n");
> > > 
> > > This worries me. Relayfs kernel side is quite simple which would suggest
> > > there is maybe a bug in the logger? And it is very important that we
> > > trust logger correctly capturing data. Because people will spend a lot
> > > of time debugging issues from the logs and it would be bad to have to
> > > chase irregularities caused by a problem in the logger.
> > > 
> > Actually I see this issue rarely and only after logging some GBs of
> > data. And most likely it won't have any bearing on the correctness of
> > captured data, the missed snapshot shall be pulled in the next iteration
> > of capture loop. Also no-overwrite mode of relay is being used.
> > 
> > As per my hunch, it can't be ruled out that this is not a relayfs issue.
> > 
> > > Could you please try and identify why exactly is this happening?
> > Will try to investigate both from relayfs side as well as logger side,
> > but it would take some time.
> > This may not be considered a blocker.
> > 
> 
> As you reported off list, after some deep investigation there seems to be no
> potential for lost packets. "Empty wakeup" is just a consequence of
> unsynchronized nature of relayfs.
> 
> In the light of that I am happy with the logger, so:
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Regards,
> 
> Tvrtko
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


More information about the Intel-gfx mailing list