[Intel-gfx] [PATCH] tools/intel_guc_logger: Utility for capturing GuC firmware logs in a file
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Sep 6 13:17:14 UTC 2016
Hi,
On 06/09/16 11:43, akash.goel at intel.com wrote:
> From: Akash Goel <akash.goel at intel.com>
>
> This patch provides a test utility which helps capture GuC firmware logs and
> then dump them to file.
> The logs are pulled from a debugfs file '/sys/kernel/debug/dri/guc_log' and
> stored into a file '/tmp/guc_log_dump.dat', the name of the output file can
> be changed through a command line argument.
>
> The utility goes into an infinite loop where it waits for the arrival of new
> logs and as soon as new set of logs are produced it captures them in its local
> buffer which is then flushed out to the file on disk.
> Any time when logging needs to be ended, User can stop this utility (CTRL+C).
>
> Before entering into a loop, it first discards whatever logs are present in
> the debugfs file.
> This way User can first launch this utility and then start a workload/activity
> for which GuC firmware logs are to be actually captured and keep running the
> utility for as long as its needed, like once the workload is over this utility
> can be forcefully stopped.
>
> If the logging wasn't enabled on GuC side by the Driver at boot time, utility
> will first enable the logging and later on when it is stopped (CTRL+C) it will
> also pause the logging on GuC side.
>
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> ---
> tools/Makefile.sources | 1 +
> tools/intel_guc_logger.c | 441 +++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 442 insertions(+)
> create mode 100644 tools/intel_guc_logger.c
>
> diff --git a/tools/Makefile.sources b/tools/Makefile.sources
> index 2bb6c8e..be58871 100644
> --- a/tools/Makefile.sources
> +++ b/tools/Makefile.sources
> @@ -19,6 +19,7 @@ tools_prog_lists = \
> intel_gpu_time \
> intel_gpu_top \
> intel_gtt \
> + intel_guc_logger \
> intel_infoframes \
> intel_l3_parity \
> intel_lid \
> diff --git a/tools/intel_guc_logger.c b/tools/intel_guc_logger.c
> new file mode 100644
> index 0000000..92172fa
> --- /dev/null
> +++ b/tools/intel_guc_logger.c
> @@ -0,0 +1,441 @@
> +
> +#define _GNU_SOURCE /* For using O_DIRECT */
> +#include <inttypes.h>
> +#include <stdio.h>
> +#include <fcntl.h>
> +#include <sched.h>
> +#include <sys/ioctl.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <signal.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <sys/poll.h>
> +#include <sys/wait.h>
> +#include <sys/types.h>
> +#include <sys/syscall.h>
> +#include <sys/mman.h>
> +#include <assert.h>
> +#include <pthread.h>
> +
> +#include "igt.h"
> +
> +#define MB(x) ((uint64_t)(x) * 1024 * 1024)
> +#ifndef PAGE_SIZE
> + #define PAGE_SIZE 4096
> +#endif
> +#define SUBBUF_SIZE (19*PAGE_SIZE)
> +/* Need large buffering from logger side to hide the DISK IO latency, Driver
> + * can only store 8 snapshots of GuC log buffer in relay.
> + */
> +#define NUM_SUBBUFS 100
> +
> +#define RELAY_FILE_NAME "guc_log"
> +#define CONTROL_FILE_NAME "i915_guc_log_control"
> +
> +char *read_buffer;
> +char *out_filename;
> +int poll_timeout = 2; /* by default 2ms timeout */
> +pthread_mutex_t mutex;
> +pthread_t flush_thread;
> +int verbosity_level = 3; /* by default capture logs at max verbosity */
> +uint32_t produced, consumed;
> +uint64_t total_bytes_written;
> +int num_buffers = NUM_SUBBUFS;
> +int relay_fd, outfile_fd = -1;
> +bool stop_logging, discard_oldlogs;
> +uint32_t test_duration, max_filesize;
> +pthread_cond_t underflow_cond, overflow_cond;
> +
> +static void guc_log_control(bool enable_logging)
> +{
> + int control_fd;
> + char data[19];
> + uint64_t val;
> + int ret;
> +
> + control_fd = igt_debugfs_open(CONTROL_FILE_NAME, O_WRONLY);
> + if (control_fd < 0)
> + igt_assert_f(0, "Couldn't open the guc log control file");
> +
> + val = enable_logging ? ((verbosity_level << 4) | 0x1) : 0;
> +
> + snprintf(data, sizeof(data), "0x%" PRIx64, val);
> + ret = write(control_fd, data, strlen(data) + 1);
Minor: It looks safe like it is but something like below would maybe be
more robust?
ret = snprintf(data, sizeof(data), "0x%" PRIx64, val);
igt_assert(ret > 2 && ret < sizeof(data));
ret = write(control_fd, data, ret);
igt_assert(ret > 0); // assuming short writes can't happen
Up to you.
> + if (ret < 0)
> + igt_assert_f(0, "Couldn't write to the log control file");
> +
> + close(control_fd);
> +}
> +
> +static void int_sig_handler(int sig)
> +{
> + igt_info("Received signal %d\n", sig);
> +
> + stop_logging = true;
> +}
> +
> +static void pull_leftover_data(void)
> +{
> + unsigned int bytes_read = 0;
> + int ret;
> +
> + while (1) {
> + /* Read the logs from relay buffer */
> + ret = read(relay_fd, read_buffer, SUBBUF_SIZE);
> + if (!ret)
> + break;
> + else if (ret < 0)
> + igt_assert_f(0, "Failed to read from the guc log file");
> + else if (ret < SUBBUF_SIZE)
> + igt_assert_f(0, "invalid read from relay file");
> +
> + bytes_read += ret;
> +
> + if (outfile_fd > 0) {
>= 0 I think. Or is it even needed since open_output_file asserts if it
fails to open?
> + ret = write(outfile_fd, read_buffer, SUBBUF_SIZE);
> + if (ret < 0)
> + igt_assert_f(0, "Couldn't dump the logs in a file");
Do you need to check for short writes as well here? Man page for write
says it can happen with out of disk space or signals.
> + total_bytes_written += ret;
> + }
> + };
> +
> + igt_debug("%u bytes flushed\n", bytes_read);
> +}
> +
> +static int num_filled_bufs(void)
> +{
> + return (produced - consumed);
> +}
> +
> +static void pull_data(void)
> +{
> + char *ptr;
> + int ret;
> +
> + pthread_mutex_lock(&mutex);
> + while (num_filled_bufs() >= num_buffers) {
> + igt_debug("overflow, will wait, produced %u, consumed %u\n", produced, consumed);
> + /* Stall the main thread in case of overflow, as there are no
> + * buffers available to store the new logs, otherwise there
> + * could be corruption if both threads work on the same buffer.
> + */
> + pthread_cond_wait(&overflow_cond, &mutex);
> + };
> + pthread_mutex_unlock(&mutex);
> +
> + ptr = read_buffer + (produced % num_buffers) * SUBBUF_SIZE;
> +
> + /* Read the logs from relay buffer */
> + ret = read(relay_fd, ptr, SUBBUF_SIZE);
> + if (ret < 0)
> + igt_assert_f(0, "Failed to read from the guc log file");
> + else if (ret && (ret < SUBBUF_SIZE))
> + igt_assert_f(0, "invalid read from relay file");
> +
> + pthread_mutex_lock(&mutex);
> + if (ret) {
> + produced++;
> + pthread_cond_signal(&underflow_cond);
> + }
> + pthread_mutex_unlock(&mutex);
Very minor: can move the mutex under the if.
> +}
> +
> +static void *flusher(void *arg)
> +{
> + char *ptr;
> + int ret;
> +
> + igt_debug("Execution started of flusher thread\n");
> +
> + do {
> + pthread_mutex_lock(&mutex);
> + while (!num_filled_bufs()) {
> + /* Exit only after completing the flush of all the filled
> + * buffers as User would expect that all logs captured up
> + * till the point of interruption/exit are written out to
> + * the disk file.
> + */
> + if (stop_logging) {
> + igt_debug("flusher to exit now\n");
> + pthread_mutex_unlock(&mutex);
> + return NULL;
> + }
> + pthread_cond_wait(&underflow_cond, &mutex);
> + };
> + pthread_mutex_unlock(&mutex);
> +
> + ptr = read_buffer + (consumed % num_buffers) * SUBBUF_SIZE;
> +
> + ret = write(outfile_fd, ptr, SUBBUF_SIZE);
> + if (ret < 0)
> + igt_assert_f(0, "Couldn't dump the logs in a file");
Question about short writes applies here as well.
> +
> + total_bytes_written += ret;
> + if (max_filesize && (total_bytes_written > MB(max_filesize))) {
> + igt_debug("Reached the target of %" PRIu64 " bytes\n", MB(max_filesize));
> + stop_logging = true;
> + }
> +
> + pthread_mutex_lock(&mutex);
> + consumed++;
> + pthread_cond_signal(&overflow_cond);
> + pthread_mutex_unlock(&mutex);
> + } while(1);
Minor again: You use both "while(1) {}" and "do { } while(1)" in here.
Maybe stick with one? :D (I think "for (;;)" is preferred, but maybe I
am misremembering things.)
> +
> + return NULL;
> +}
> +
> +static void init_flusher_thread(void)
> +{
> + struct sched_param thread_sched;
> + pthread_attr_t p_attr;
> +
> + pthread_cond_init(&underflow_cond, NULL);
> + pthread_cond_init(&overflow_cond, NULL);
> + pthread_mutex_init(&mutex, NULL);
> +
> + if (pthread_attr_init(&p_attr))
> + igt_assert_f(0, "Error obtaining default thread attributes");
> +
> + if (pthread_attr_setinheritsched(&p_attr, PTHREAD_EXPLICIT_SCHED))
> + igt_assert_f(0, "Couldn't set inheritsched");
> +
> + if (pthread_attr_setschedpolicy(&p_attr, SCHED_RR))
> + igt_assert_f(0, "Couldn't set thread scheduling policy");
> +
> + /* Keep the flusher task also at rt priority, so that it doesn't get
> + * too late in flushing the collected logs in local buffers to the disk,
> + * and so main thread always have spare buffers to collect the logs.
> + */
> + thread_sched.sched_priority = 5;
> + if (pthread_attr_setschedparam(&p_attr, &thread_sched))
> + igt_assert_f(0, "Couldn't set thread priority");
> +
> + if (pthread_create(&flush_thread, &p_attr, flusher, NULL))
> + igt_assert_f(0, "thread creation failed");
More nitpicks: You mix starting error messages with upper case and lower
case.
> +
> + if (pthread_attr_destroy(&p_attr))
> + igt_assert_f(0, "Error destroying thread attributes");
> +}
> +
> +static void open_relay_file(void)
> +{
> + relay_fd = igt_debugfs_open(RELAY_FILE_NAME, O_RDONLY);
> + if (relay_fd < 0)
> + igt_assert_f(0, "Couldn't open the guc log file");
> +
> + /* Purge the old/boot-time logs from the relay buffer.
> + * This is more for Val team's requirement, where they have to first
> + * purge the existing logs before starting the tests for which the logs
> + * are actually needed. After this logger will enter into a loop and
> + * wait for the new data, at that point benchmark can be launched from
> + * a different shell.
> + */
> + if (discard_oldlogs)
> + pull_leftover_data();
> +}
> +
> +static void open_output_file(void)
> +{
> + char *filename;
> +
> + if (asprintf(&filename, "/tmp/%s", out_filename ? : "guc_log_dump.dat") < 0)
> + igt_assert_f(0, "Couldn't allocate the filename");
I think you need to take the out_filename when specified by the user
as-is and not stick it in /tmp. It is perfectly reasonable that someone
would want to put it somewhere else.
Also better to move the default name of "guc_log_dump.dat" to where
defaults like RELAY_FILE_NAME and such are defined.
And I would not default to /tmp but the current directory.
> +
> + /* Use Direct IO mode for the output file, as the data written is not
> + * supposed to be accessed again, this saves a copy of data from App's
> + * buffer to kernel buffer (Page cache). Due to no buffering on kernel
> + * side, data is flushed out to disk faster and more buffering can be
> + * done on the logger side to hide the disk IO latency.
> + */
> + outfile_fd = open(filename, O_CREAT | O_WRONLY | O_TRUNC | O_DIRECT, 0444);
0400 or 0440 would be better I think.
> + if (outfile_fd < 0)
> + igt_assert_f(0, "Couldn't open the output file");
> +
> + free(out_filename);
> + free(filename);
> +}
> +
> +static void init_main_thread(void)
> +{
> + struct sched_param thread_sched;
> + /* Run the main thread at highest priority to ensure that it always
> + * gets woken-up at earliest on arrival of new data and so is always
> + * ready to pull the logs, otherwise there could be loss logs if
> + * GuC firmware is generating logs at a very high rate.
> + */
> + thread_sched.sched_priority = 1;
> + if (sched_setscheduler(getpid(), SCHED_FIFO, &thread_sched))
> + igt_assert_f(0, "Couldn't set the priority");
> +
> + if (signal(SIGINT, int_sig_handler) == SIG_ERR)
> + igt_assert_f(0, "SIGINT handler registration failed");
> +
> + /* Need an aligned pointer for direct IO */
> + if (posix_memalign((void **)&read_buffer, PAGE_SIZE, num_buffers * SUBBUF_SIZE))
> + igt_assert_f(0, "Couldn't allocate the read buffer");
> +
> + /* Keep the pages locked in RAM, avoid page fault overhead */
> + if (mlock(read_buffer, num_buffers * SUBBUF_SIZE))
> + igt_assert_f(0, "Failed to lock memory");
> +
> + /* Enable the logging, it may not have been enabled from boot and so
> + * the relay file also wouldn't have been created.
> + */
> + guc_log_control(true);
> +
> + open_relay_file();
> + open_output_file();
> +}
> +
> +static int parse_options(int opt, int opt_index, void *data)
> +{
> + igt_debug("opt %c optarg %s\n", opt, optarg);
> +
> + switch(opt) {
> + case 'v':
> + verbosity_level = atoi(optarg);
> + if (verbosity_level < 0)
> + verbosity_level = 0;
> + else if (verbosity_level > 3)
> + verbosity_level = 3;
Notify user if invalid value was provided?
> +
> + igt_debug("verbosity level to be used is %d\n", verbosity_level);
> + break;
> + case 'o':
> + if (asprintf(&out_filename, "%s", optarg) < 0)
strdup?
> + igt_assert_f(0, "Couldn't allocate the o/p filename");
> + igt_debug("logs to be stored in file /tmp/%s\n", out_filename);
> + break;
> + case 'b':
> + num_buffers = atoi(optarg);
> + if (num_buffers == 0)
> + num_buffers = NUM_SUBBUFS;
Again notify if invalid number/value.
> + igt_debug("number of buffers to be used is %d\n", num_buffers);
> + break;
> + case 't':
> + test_duration = atoi(optarg);
> + if (test_duration > 0)
If switch was specified it should have a valid optarg or error reported.
> + igt_debug("logger to run for %d seconds\n", test_duration);
> + break;
> + case 'p':
> + poll_timeout = atoi(optarg);
> + if (poll_timeout == 0)
> + poll_timeout = 2; /* at least use 2ms poll timeout */
Hm I would prefer if you allowed no poll period. I even think it should
be the default if we think relayfs will be definitely fixed to not
require it.
In any case, commend is wrong since user can specify 1ms and get away
with it. :)
> + if (poll_timeout > 0)
> + igt_debug("polling to be done with %d milli seconds timeout\n", poll_timeout);
milliseconds
> + break;
> + case 's':
> + max_filesize = atoi(optarg);
> + if (max_filesize > 0)
Same as for -t.
> + igt_debug("max allowed size of the output file is %d MBs\n", max_filesize);
> + break;
> + case 'd':
> + discard_oldlogs = true;
> + igt_debug("old/boot-time logs will be discarded\n");
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static void process_command_line(int argc, char **argv)
> +{
> + static struct option long_options[] = {
> + {"verbosity", required_argument, 0, 'v'},
> + {"outputfilename", required_argument, 0, 'o'},
> + {"buffers", required_argument, 0, 'b'},
> + {"testduration", required_argument, 0, 't'},
> + {"polltimeout", required_argument, 0, 'p'},
> + {"size", required_argument, 0, 's'},
> + {"discard", no_argument, 0, 'd'},
> + { 0, 0, 0, 0 }
> + };
> +
> + const char *help =
> + " -v --verbosity=level verbosity level of GuC logging (0-3)\n"
> + " -o --outfile=name name of the output file in /tmp directory, where logs will be stored\n"
> + " -b --buffers=num number of buffers to be maintained on logger side for storing logs\n"
> + " -t --testduration=sec max duration in seconds for which the logger should run, 0 == forever\n"
> + " -p --polltimeout=ms polling timeout in ms, -1 == indefinite wait for the new data\n"
> + " -s --size=MB max size of output file in MBs after which logging will be stopped, 0 == no limit\n"
> + " -d --discard discard the old/boot-time logs before entering into the capture loop\n";
> +
> + igt_simple_init_parse_opts(&argc, argv, "v:o:b:t:p:s:d", long_options,
> + help, parse_options, NULL);
> +}
> +
> +int main(int argc, char **argv)
> +{
> + struct pollfd relay_poll_fd;
> + struct timespec start={};
> + int nfds;
> + int ret;
> +
> + process_command_line(argc, argv);
> +
> + init_main_thread();
> +
> + /* Use a separate thread for flushing the logs to a file on disk.
> + * Main thread will buffer the data from relay file in its pool of
> + * buffers and other thread will flush the data to disk in background.
> + * This is needed, albeit by default data is written out to disk in
> + * async mode, as when there are too many dirty pages in the RAM,
> + * (/proc/sys/vm/dirty_ratio), kernel starts blocking the processes
> + * doing the file writes.
> + */
> + init_flusher_thread();
> +
> + relay_poll_fd.fd = relay_fd;
> + relay_poll_fd.events = POLLIN;
> + relay_poll_fd.revents = 0;
> +
> + nfds = 1; /* only one fd to poll */
> +
> + while (!stop_logging)
> + {
> + if (test_duration && (igt_seconds_elapsed(&start) > test_duration)) {
If you agree to allow no poll period the this would not work right? In
that case you would need to use alarm(2) or something.
> + igt_debug("Ran for stipulated %d seconds, exit now\n", test_duration);
> + stop_logging = true;
> + break;
> + }
> +
> + /* Wait/poll for the new data to be available, relay doesn't
> + * provide a blocking read.
> + * Need to do polling with a timeout instead of indefinite wait,
> + * to avoid relying on relay for the wakeup, as relay does wakeup
> + * in a deferred manner on jiffies granularity by scheduling a
> + * timer and moreover that timer is re-scheduled on every newly
> + * produced buffer so timer keeps getting pushed out if there
> + * are multiple flush interrupts in a very quick succession (less
> + * than a jiffy gap between 2 flush interrupts) and relay runs
> + * out of sub buffers to store the new logs.
> + */
> + ret = poll(&relay_poll_fd, nfds, poll_timeout);
> + if (ret < 0) {
> + if (errno == EINTR)
> + break;
> + else
> + igt_assert_f(0, "poll call failed");
> + }
> +
> + /* No data available yet, poll again, hopefully new data is round the corner */
> + if (!relay_poll_fd.revents)
> + continue;
> +
> + pull_data();
> + }
> +
> + /* Pause logging on the GuC side*/
> + guc_log_control(false);
> + pthread_cond_signal(&underflow_cond);
> + pthread_join(flush_thread, NULL);
> + pull_leftover_data();
> + igt_info("total_bytes_written %" PRIu64 "\n", total_bytes_written);
> +
> + free(read_buffer);
> + close(relay_fd);
> + close(outfile_fd);
> + igt_exit();
> +}
>
So only minor comments. The rest looks reasonable to me and I trust you
tested it well and it works fine.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list