[Intel-gfx] [PATCH v4] tools/intel_guc_logger: Utility for capturing GuC firmware logs in a file
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Oct 12 13:32:17 UTC 2016
On 10/10/2016 15:31, Goel, Akash wrote:
>
>
> On 10/10/2016 7:22 PM, Tvrtko Ursulin wrote:
>>
>> On 10/10/2016 11:59, akash.goel at intel.com wrote:
>>> From: Akash Goel <akash.goel at intel.com>
>>>
>>> This patch provides a test utility which helps capture GuC firmware
>>> logs and
>>> then dump them to file.
>>> The logs are pulled from a debugfs file
>>> '/sys/kernel/debug/dri/guc_log' and
>>> by default stored into a file 'guc_log_dump.dat'. The name,
>>> including the
>>> location, of the output file can be changed through a command line
>>> argument.
>>>
>>> The utility goes into an infinite loop where it waits for the arrival
>>> of new
>>> logs and as soon as new set of logs are produced it captures them in
>>> its local
>>> buffer which is then flushed out to the file on disk.
>>> Any time when logging needs to be ended, User can stop this utility
>>> (CTRL+C).
>>>
>>> Before entering into a loop, it first discards whatever logs are
>>> present in
>>> the debugfs file.
>>> This way User can first launch this utility and then start a
>>> workload/activity
>>> for which GuC firmware logs are to be actually captured and keep
>>> running the
>>> utility for as long as its needed, like once the workload is over this
>>> utility
>>> can be forcefully stopped.
>>>
>>> If the logging wasn't enabled on GuC side by the Driver at boot time,
>>> utility
>>> will first enable the logging and later on when it is stopped (CTRL+C)
>>> it will
>>> also pause the logging on GuC side.
>>>
>>> v2:
>>> - Use combination of alarm system call & SIGALRM signal to run the
>>> utility
>>> for required duration. (Tvrtko)
>>> - Fix inconsistencies, do minor cleanup and refactoring. (Tvrtko)
>>>
>>> v3:
>>> - Fix discrepancy for the output file command line option and update
>>> the
>>> Usage/help string.
>>>
>>> v4:
>>> - Update the exit condition for flusher thread, now will exit only
>>> after
>>> the capture loop is over and not when the flag to stop logging is
>>> set.
>>> This handles a corner case, due to which the dump of last captured
>>> buffer
>>> was getting missed.
>>> - Add a newline character at the end of assert messages.
>>> - Avoid the assert for the case, which occurs very rarely, when there
>>> are no
>>> bytes read from the relay file.
>>>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>> Signed-off-by: Akash Goel <akash.goel at intel.com>
>>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com> (v3)
>>> ---
>>> tools/Makefile.sources | 1 +
>>> tools/intel_guc_logger.c | 438
>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 439 insertions(+)
>>> create mode 100644 tools/intel_guc_logger.c
>>>
>>> diff --git a/tools/Makefile.sources b/tools/Makefile.sources
>>> index 2bb6c8e..be58871 100644
>>> --- a/tools/Makefile.sources
>>> +++ b/tools/Makefile.sources
>>> @@ -19,6 +19,7 @@ tools_prog_lists = \
>>> intel_gpu_time \
>>> intel_gpu_top \
>>> intel_gtt \
>>> + intel_guc_logger \
>>> intel_infoframes \
>>> intel_l3_parity \
>>> intel_lid \
>>> diff --git a/tools/intel_guc_logger.c b/tools/intel_guc_logger.c
>>> new file mode 100644
>>> index 0000000..159a54e
>>> --- /dev/null
>>> +++ b/tools/intel_guc_logger.c
>>> @@ -0,0 +1,438 @@
>>> +
>>> +#define _GNU_SOURCE /* For using O_DIRECT */
>>> +#include <inttypes.h>
>>> +#include <stdio.h>
>>> +#include <fcntl.h>
>>> +#include <sched.h>
>>> +#include <sys/ioctl.h>
>>> +#include <string.h>
>>> +#include <errno.h>
>>> +#include <signal.h>
>>> +#include <stdlib.h>
>>> +#include <unistd.h>
>>> +#include <sys/poll.h>
>>> +#include <sys/wait.h>
>>> +#include <sys/types.h>
>>> +#include <sys/syscall.h>
>>> +#include <sys/mman.h>
>>> +#include <assert.h>
>>> +#include <pthread.h>
>>> +
>>> +#include "igt.h"
>>> +
>>> +#define MB(x) ((uint64_t)(x) * 1024 * 1024)
>>> +#ifndef PAGE_SIZE
>>> + #define PAGE_SIZE 4096
>>> +#endif
>>> +/* Currently the size of GuC log buffer is 19 pages & so is the size
>>> of relay
>>> + * subbuffer. If the size changes in future, then this define also
>>> needs to be
>>> + * updated accordingly.
>>> + */
>>> +#define SUBBUF_SIZE (19*PAGE_SIZE)
>>> +/* Need large buffering from logger side to hide the DISK IO latency,
>>> Driver
>>> + * can only store 8 snapshots of GuC log buffer in relay.
>>> + */
>>> +#define NUM_SUBBUFS 100
>>> +
>>> +#define RELAY_FILE_NAME "guc_log"
>>> +#define DEFAULT_OUTPUT_FILE_NAME "guc_log_dump.dat"
>>> +#define CONTROL_FILE_NAME "i915_guc_log_control"
>>> +
>>> +char *read_buffer;
>>> +char *out_filename;
>>> +int poll_timeout = 2; /* by default 2ms timeout */
>>> +pthread_mutex_t mutex;
>>> +pthread_t flush_thread;
>>> +int verbosity_level = 3; /* by default capture logs at max
>>> verbosity */
>>> +uint32_t produced, consumed;
>>> +uint64_t total_bytes_written;
>>> +int num_buffers = NUM_SUBBUFS;
>>> +int relay_fd, outfile_fd = -1;
>>> +uint32_t test_duration, max_filesize;
>>> +pthread_cond_t underflow_cond, overflow_cond;
>>> +bool stop_logging, discard_oldlogs, capturing_stopped;
>>> +
>>> +static void guc_log_control(bool enable_logging)
>>> +{
>>> + int control_fd;
>>> + char data[19];
>>> + uint64_t val;
>>> + int ret;
>>> +
>>> + control_fd = igt_debugfs_open(CONTROL_FILE_NAME, O_WRONLY);
>>> + igt_assert_f(control_fd >= 0, "couldn't open the guc log control
>>> file\n");
>>> +
>>> + val = enable_logging ? ((verbosity_level << 4) | 0x1) : 0;
>>> +
>>> + ret = snprintf(data, sizeof(data), "0x%" PRIx64, val);
>>> + igt_assert(ret > 2 && ret < sizeof(data));
>>> +
>>> + ret = write(control_fd, data, ret);
>>> + igt_assert_f(ret > 0, "couldn't write to the log control file\n");
>>> +
>>> + close(control_fd);
>>> +}
>>> +
>>> +static void int_sig_handler(int sig)
>>> +{
>>> + igt_info("received signal %d\n", sig);
>>> +
>>> + stop_logging = true;
>>> +}
>>> +
>>> +static void pull_leftover_data(void)
>>> +{
>>> + unsigned int bytes_read = 0;
>>> + int ret;
>>> +
>>> + do {
>>> + /* Read the logs from relay buffer */
>>> + ret = read(relay_fd, read_buffer, SUBBUF_SIZE);
>>> + if (!ret)
>>> + break;
>>> +
>>> + igt_assert_f(ret > 0, "failed to read from the guc log
>>> file\n");
>>> + igt_assert_f(ret == SUBBUF_SIZE, "invalid read from relay
>>> file\n");
>>> +
>>> + bytes_read += ret;
>>> +
>>> + if (outfile_fd >= 0) {
>>> + ret = write(outfile_fd, read_buffer, SUBBUF_SIZE);
>>> + igt_assert_f(ret == SUBBUF_SIZE, "couldn't dump the logs
>>> in a file\n");
>>> + total_bytes_written += ret;
>>> + }
>>> + } while(1);
>>> +
>>> + igt_debug("%u bytes flushed\n", bytes_read);
>>> +}
>>> +
>>> +static int num_filled_bufs(void)
>>> +{
>>> + return (produced - consumed);
>>> +}
>>> +
>>> +static void pull_data(void)
>>> +{
>>> + char *ptr;
>>> + int ret;
>>> +
>>> + pthread_mutex_lock(&mutex);
>>> + while (num_filled_bufs() >= num_buffers) {
>>> + igt_debug("overflow, will wait, produced %u, consumed %u\n",
>>> produced, consumed);
>>> + /* Stall the main thread in case of overflow, as there are no
>>> + * buffers available to store the new logs, otherwise there
>>> + * could be corruption if both threads work on the same
>>> buffer.
>>> + */
>>> + pthread_cond_wait(&overflow_cond, &mutex);
>>> + };
>>> + pthread_mutex_unlock(&mutex);
>>> +
>>> + ptr = read_buffer + (produced % num_buffers) * SUBBUF_SIZE;
>>> +
>>> + /* Read the logs from relay buffer */
>>> + ret = read(relay_fd, ptr, SUBBUF_SIZE);
>>> + igt_assert_f(ret >= 0, "failed to read from the guc log file\n");
>>> + igt_assert_f(!ret || ret == SUBBUF_SIZE, "invalid read from relay
>>> file\n");
>>> +
>>> + if (ret) {
>>> + pthread_mutex_lock(&mutex);
>>> + produced++;
>>> + pthread_cond_signal(&underflow_cond);
>>> + pthread_mutex_unlock(&mutex);
>>> + } else {
>>> + /* Occasionally (very rare) read from the relay file
>>> returns no
>>> + * data, albeit the polling done prior to read call indicated
>>> + * availability of data.
>>> + */
>>> + igt_debug("no data read from the relay file\n");
>>
>> This worries me. Relayfs kernel side is quite simple which would suggest
>> there is maybe a bug in the logger? And it is very important that we
>> trust logger correctly capturing data. Because people will spend a lot
>> of time debugging issues from the logs and it would be bad to have to
>> chase irregularities caused by a problem in the logger.
>>
> Actually I see this issue rarely and only after logging some GBs of
> data. And most likely it won't have any bearing on the correctness of
> captured data, the missed snapshot shall be pulled in the next
> iteration of capture loop. Also no-overwrite mode of relay is being used.
>
> As per my hunch, it can't be ruled out that this is not a relayfs issue.
>
>> Could you please try and identify why exactly is this happening?
> Will try to investigate both from relayfs side as well as logger side,
> but it would take some time.
> This may not be considered a blocker.
>
As you reported off list, after some deep investigation there seems to
be no potential for lost packets. "Empty wakeup" is just a consequence
of unsynchronized nature of relayfs.
In the light of that I am happy with the logger, so:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list