[systemd-devel] Empty journal files consume space
Steve Traylen
steve.traylen at cern.ch
Thu Feb 1 14:43:12 UTC 2024
On 01/02/2024 14:48, Steve Traylen wrote:
> On 01/02/2024 13:45, Andrei Borzenkov wrote:
>
>> On Thu, Feb 1, 2024 at 3:25 PM Steve Traylen <steve.traylen at cern.ch>
>> wrote:
>>> Hi,
>>>
>>> I'm trying to understand why I am only retaining just a couple of days
>>> of logs when I would like to have more.
>>>
>>> The system journalctl head of the logs is only today:
>>> Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]: Data hash table
>>> of /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal has
>>> a fill level at 75.0 (174765 of 233016 items, 58720256 file size, 335
>>> bytes per hash table item), suggesting rotation.
>>> Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]:
>>> /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal:
>>> Journal header limits reached or header out-of-date, rotating.
>>>
>>>
>>> # journalctl --disk-usage
>>> Archived and active journals take up 8.1G in the file system.
>>>
>>> Reality is system journal is tiny:
>>>
>>> # du -sh system.journal
>>> 17M system.journal
>>>
>>> However we do have many
>>>
>>> # ls -l user-*journal | wc -l
>>> 1044
>>>
>>> and indeed
>>>
>>> # du -sh /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0
>>> 8.2G /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0
>>>
>>> The vast majority of these user journals are empty and offline
>>>
>>> # file user-*journal | awk '{print $4, $5}' | sort | uniq -c
>>> 940 empty, offline
>>> 102 offline
>>> 2 online
>>>
>>>
>>> These user journals are all 8.0M is size
>>>
>>> So I think I have two questions:
>>>
>>> 1) Why am I loosing old logs sooner than I would like - what limit is "
>>> fill level at 75.0 (174765 of 233016 items"
>> You did not provide any evidence that logs are lost. Archived
>> (offline) logs are processed and searched by journalctl so the oldest
>> available log is the oldest archive file, not the current online file.
>>
>> The limit is the fill grade of the hash table in the individual log
>> file. It is hard coded and unrelated to the limits configured in the
>> journald.conf. It may affect how long logs are kept if you configured
>> retention by the number of log files.
> Thanks for reply.
>
> There are no archive files I believe:
>
> # ls /var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/*system*
> /var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/system.journal
>
> The archive files would be alongside the live file I believe.
>
> Just tried an explicit " journalctl --rotate" which logs:
>
> Feb 01 14:36:33 nodeX.example.ch systemd-journald[658]: System Journal
> (/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e) is 8.0G, max 3.0G,
> 0B free.
> Feb 01 14:36:40 nodeX.example.ch systemd-journald[658]: Received
> client request to rotate journal, rotating.
> Feb 01 14:36:40nodeX.example.ch systemd-journald[658]: Deleted empty
> archived journal
> /var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/user-1234 at 537a18390e124dd6b4cf41a69ef5780d-0000000000000000-0000000000000000.journal
> (3.5M).
> Feb 01 14:36:40 lxplus978.cern.ch systemd-journald[658]: Deleted empty
> archived journal
> /var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/user-1235 at d7d23966c1454001a714ee5aef039c60-0000000000000000-0000000000000000.journal
> (3.5M).
>
> So now maybe I understand at rotation I am over the configured max of
> 3GB so perhaps no archive is generated. Looking at another node with
> fewer number of users having ever logged in I have the archive of
> of the system log and a longer history. Those 940 "empty, offline"
> user journals consume the space providing no particular value.
>
> No other indication that rotation may not have worked.
>
>
>>> 2) Is there a safe mechanism to delete those empty offline user
>>> journals?
>>>
>> Just delete them.
Wrote a tiny script to delete them:
for FILE in /var/log/journal/$(cat
/etc/machine-id)/user-+([0-9]*).journal ; do
if [ "$(file --brief $FILE)" == 'Journal file empty, offline' ]
; then
rm -f $FILE
echo "$(basename $FILE) was empty and offline so removed"
fi
done
works perfectly - unfortunately about 20 seconds later journald (I
presume) re-creates them all despite the vast majority
of users having no current processes on the nodes.
>>
>>> Thanks.
>>>
>>> Steve.
>>>
>>> Version and configuration:
>>>
>>> systemd-252-18.el9 - RHEL9 with a configuration of:
>>>
>>> [Journal]
>>> Storage = persistent
>>> SplitMode = uid
>>> SystemMaxUse = 3G
>>> SystemKeepFree = 10G
>>> MaxRetentionSec = 1year
>>>
>>> # df -h /
>>> Filesystem Size Used Avail Use% Mounted on
>>> /dev/vda1 80G 65G 16G 81% /
>>>
>>>
More information about the systemd-devel
mailing list