GSoC'23 - Week 3 Report
Baole Fang
baole.fang at gmail.com
Sun Jun 25 18:52:15 UTC 2023
---
title: "Week 3"
date: 2023-06-15
---
### Model training
Basic [model training](
https://github.com/baolef/libreoffice-ci/blob/main/train.py) pipeline is
completed with [testselect](
https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py)
model. Further optimization is needed to reduce memory and time cost,
together with performance.
Currently, [testselect](
https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py) is
trained on a subset of size 16384 (containing training and testing set) of
the full dataset of size 122019 due to memory cost, and it has reached a
failure recall of 91.4% and saving 90% of unit test computational cost. Its
detailed confusion matrix is shown below:
| | Fail (Predicted) | Pass (Predicted) |
|---------------|------------------|------------------|
| Fail (Actual) | 480 | 45 |
| Pass (Actual) | 556910 | 5045893 |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20230625/4dc0c302/attachment.htm>
More information about the LibreOffice
mailing list