[Intel-gfx] [PATCH v4] tools/intel_guc_logger: Utility for capturing GuC firmware logs in a file

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Oct 12 13:32:17 UTC 2016


On 10/10/2016 15:31, Goel, Akash wrote:
>
>
> On 10/10/2016 7:22 PM, Tvrtko Ursulin wrote:
>>
>> On 10/10/2016 11:59, akash.goel at intel.com wrote:
>>> From: Akash Goel <akash.goel at intel.com>
>>>
>>> This patch provides a test utility which helps capture GuC firmware
>>> logs and
>>> then dump them to file.
>>> The logs are pulled from a debugfs file
>>> '/sys/kernel/debug/dri/guc_log' and
>>> by default stored into a file 'guc_log_dump.dat'. The name, 
>>> including the
>>> location, of the output file can be changed through a command line
>>> argument.
>>>
>>> The utility goes into an infinite loop where it waits for the arrival
>>> of new
>>> logs and as soon as new set of logs are produced it captures them in
>>> its local
>>> buffer which is then flushed out to the file on disk.
>>> Any time when logging needs to be ended, User can stop this utility
>>> (CTRL+C).
>>>
>>> Before entering into a loop, it first discards whatever logs are
>>> present in
>>> the debugfs file.
>>> This way User can first launch this utility and then start a
>>> workload/activity
>>> for which GuC firmware logs are to be actually captured and keep
>>> running the
>>> utility for as long as its needed, like once the workload is over this
>>> utility
>>> can be forcefully stopped.
>>>
>>> If the logging wasn't enabled on GuC side by the Driver at boot time,
>>> utility
>>> will first enable the logging and later on when it is stopped (CTRL+C)
>>> it will
>>> also pause the logging on GuC side.
>>>
>>> v2:
>>> - Use combination of alarm system call & SIGALRM signal to run the
>>> utility
>>>    for required duration. (Tvrtko)
>>> - Fix inconsistencies, do minor cleanup and refactoring. (Tvrtko)
>>>
>>> v3:
>>> - Fix discrepancy for the output file command line option and update 
>>> the
>>>    Usage/help string.
>>>
>>> v4:
>>> - Update the exit condition for flusher thread, now will exit only 
>>> after
>>>    the capture loop is over and not when the flag to stop logging is 
>>> set.
>>>    This handles a corner case, due to which the dump of last captured
>>> buffer
>>>    was getting missed.
>>> - Add a newline character at the end of assert messages.
>>> - Avoid the assert for the case, which occurs very rarely, when there
>>> are no
>>>    bytes read from the relay file.
>>>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>> Signed-off-by: Akash Goel <akash.goel at intel.com>
>>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com> (v3)
>>> ---
>>>   tools/Makefile.sources   |   1 +
>>>   tools/intel_guc_logger.c | 438
>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 439 insertions(+)
>>>   create mode 100644 tools/intel_guc_logger.c
>>>
>>> diff --git a/tools/Makefile.sources b/tools/Makefile.sources
>>> index 2bb6c8e..be58871 100644
>>> --- a/tools/Makefile.sources
>>> +++ b/tools/Makefile.sources
>>> @@ -19,6 +19,7 @@ tools_prog_lists =        \
>>>       intel_gpu_time        \
>>>       intel_gpu_top        \
>>>       intel_gtt        \
>>> +    intel_guc_logger        \
>>>       intel_infoframes    \
>>>       intel_l3_parity        \
>>>       intel_lid        \
>>> diff --git a/tools/intel_guc_logger.c b/tools/intel_guc_logger.c
>>> new file mode 100644
>>> index 0000000..159a54e
>>> --- /dev/null
>>> +++ b/tools/intel_guc_logger.c
>>> @@ -0,0 +1,438 @@
>>> +
>>> +#define _GNU_SOURCE  /* For using O_DIRECT */
>>> +#include <inttypes.h>
>>> +#include <stdio.h>
>>> +#include <fcntl.h>
>>> +#include <sched.h>
>>> +#include <sys/ioctl.h>
>>> +#include <string.h>
>>> +#include <errno.h>
>>> +#include <signal.h>
>>> +#include <stdlib.h>
>>> +#include <unistd.h>
>>> +#include <sys/poll.h>
>>> +#include <sys/wait.h>
>>> +#include <sys/types.h>
>>> +#include <sys/syscall.h>
>>> +#include <sys/mman.h>
>>> +#include <assert.h>
>>> +#include <pthread.h>
>>> +
>>> +#include "igt.h"
>>> +
>>> +#define MB(x) ((uint64_t)(x) * 1024 * 1024)
>>> +#ifndef PAGE_SIZE
>>> +  #define PAGE_SIZE 4096
>>> +#endif
>>> +/* Currently the size of GuC log buffer is 19 pages & so is the size
>>> of relay
>>> + * subbuffer. If the size changes in future, then this define also
>>> needs to be
>>> + * updated accordingly.
>>> + */
>>> +#define SUBBUF_SIZE (19*PAGE_SIZE)
>>> +/* Need large buffering from logger side to hide the DISK IO latency,
>>> Driver
>>> + * can only store 8 snapshots of GuC log buffer in relay.
>>> + */
>>> +#define NUM_SUBBUFS 100
>>> +
>>> +#define RELAY_FILE_NAME  "guc_log"
>>> +#define DEFAULT_OUTPUT_FILE_NAME  "guc_log_dump.dat"
>>> +#define CONTROL_FILE_NAME "i915_guc_log_control"
>>> +
>>> +char *read_buffer;
>>> +char *out_filename;
>>> +int poll_timeout = 2; /* by default 2ms timeout */
>>> +pthread_mutex_t mutex;
>>> +pthread_t flush_thread;
>>> +int verbosity_level = 3; /* by default capture logs at max 
>>> verbosity */
>>> +uint32_t produced, consumed;
>>> +uint64_t total_bytes_written;
>>> +int num_buffers = NUM_SUBBUFS;
>>> +int relay_fd, outfile_fd = -1;
>>> +uint32_t test_duration, max_filesize;
>>> +pthread_cond_t underflow_cond, overflow_cond;
>>> +bool stop_logging, discard_oldlogs, capturing_stopped;
>>> +
>>> +static void guc_log_control(bool enable_logging)
>>> +{
>>> +    int control_fd;
>>> +    char data[19];
>>> +    uint64_t val;
>>> +    int ret;
>>> +
>>> +    control_fd = igt_debugfs_open(CONTROL_FILE_NAME, O_WRONLY);
>>> +    igt_assert_f(control_fd >= 0, "couldn't open the guc log control
>>> file\n");
>>> +
>>> +    val = enable_logging ? ((verbosity_level << 4) | 0x1) : 0;
>>> +
>>> +    ret = snprintf(data, sizeof(data), "0x%" PRIx64, val);
>>> +    igt_assert(ret > 2 && ret < sizeof(data));
>>> +
>>> +    ret = write(control_fd, data, ret);
>>> +    igt_assert_f(ret > 0, "couldn't write to the log control file\n");
>>> +
>>> +    close(control_fd);
>>> +}
>>> +
>>> +static void int_sig_handler(int sig)
>>> +{
>>> +    igt_info("received signal %d\n", sig);
>>> +
>>> +    stop_logging = true;
>>> +}
>>> +
>>> +static void pull_leftover_data(void)
>>> +{
>>> +    unsigned int bytes_read = 0;
>>> +    int ret;
>>> +
>>> +    do {
>>> +        /* Read the logs from relay buffer */
>>> +        ret = read(relay_fd, read_buffer, SUBBUF_SIZE);
>>> +        if (!ret)
>>> +            break;
>>> +
>>> +        igt_assert_f(ret > 0, "failed to read from the guc log 
>>> file\n");
>>> +        igt_assert_f(ret == SUBBUF_SIZE, "invalid read from relay
>>> file\n");
>>> +
>>> +        bytes_read += ret;
>>> +
>>> +        if (outfile_fd >= 0) {
>>> +            ret = write(outfile_fd, read_buffer, SUBBUF_SIZE);
>>> +            igt_assert_f(ret == SUBBUF_SIZE, "couldn't dump the logs
>>> in a file\n");
>>> +            total_bytes_written += ret;
>>> +        }
>>> +    } while(1);
>>> +
>>> +    igt_debug("%u bytes flushed\n", bytes_read);
>>> +}
>>> +
>>> +static int num_filled_bufs(void)
>>> +{
>>> +    return (produced - consumed);
>>> +}
>>> +
>>> +static void pull_data(void)
>>> +{
>>> +    char *ptr;
>>> +    int ret;
>>> +
>>> +    pthread_mutex_lock(&mutex);
>>> +    while (num_filled_bufs() >= num_buffers) {
>>> +        igt_debug("overflow, will wait, produced %u, consumed %u\n",
>>> produced, consumed);
>>> +        /* Stall the main thread in case of overflow, as there are no
>>> +         * buffers available to store the new logs, otherwise there
>>> +         * could be corruption if both threads work on the same 
>>> buffer.
>>> +         */
>>> +        pthread_cond_wait(&overflow_cond, &mutex);
>>> +    };
>>> +    pthread_mutex_unlock(&mutex);
>>> +
>>> +    ptr = read_buffer + (produced % num_buffers) * SUBBUF_SIZE;
>>> +
>>> +    /* Read the logs from relay buffer */
>>> +    ret = read(relay_fd, ptr, SUBBUF_SIZE);
>>> +    igt_assert_f(ret >= 0, "failed to read from the guc log file\n");
>>> +    igt_assert_f(!ret || ret == SUBBUF_SIZE, "invalid read from relay
>>> file\n");
>>> +
>>> +    if (ret) {
>>> +        pthread_mutex_lock(&mutex);
>>> +        produced++;
>>> +        pthread_cond_signal(&underflow_cond);
>>> +        pthread_mutex_unlock(&mutex);
>>> +    } else {
>>> +        /* Occasionally (very rare) read from the relay file 
>>> returns no
>>> +         * data, albeit the polling done prior to read call indicated
>>> +         * availability of data.
>>> +         */
>>> +        igt_debug("no data read from the relay file\n");
>>
>> This worries me. Relayfs kernel side is quite simple which would suggest
>> there is maybe a bug in the logger? And it is very important that we
>> trust logger correctly capturing data. Because people will spend a lot
>> of time debugging issues from the logs and it would be bad to have to
>> chase irregularities caused by a problem in the logger.
>>
> Actually I see this issue rarely and only after logging some GBs of 
> data. And most likely it won't have any bearing on the correctness of 
> captured data, the missed snapshot shall be pulled in the next 
> iteration of capture loop. Also no-overwrite mode of relay is being used.
>
> As per my hunch, it can't be ruled out that this is not a relayfs issue.
>
>> Could you please try and identify why exactly is this happening?
> Will try to investigate both from relayfs side as well as logger side, 
> but it would take some time.
> This may not be considered a blocker.
>

As you reported off list, after some deep investigation there seems to 
be no potential for lost packets. "Empty wakeup" is just a consequence 
of unsynchronized nature of relayfs.

In the light of that I am happy with the logger, so:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko



More information about the Intel-gfx mailing list