GSoC 25: BASIC IDE - Insights from Data Discovery & C++ PoCs [WEEK 4]

Devansh Varshney varshney.devansh614 at gmail.com
Mon Jun 16 09:25:04 UTC 2025


Hi everyone,

This week has been one of the best learning experiences for me,
especially digging into the "behind-the-scenes" of LibreOffice's UNO APIs.

My initial work (Gerrit 185362
<https://gerrit.libreoffice.org/c/core/+/185362>) was a first step, but
feedback from my
mentors in our meetings provided a crucial directive: first, figure out
how to get the data. Before we can build a great auto-completion system,
we need a deep, proven understanding of where all the information (for
BASIC, UNO, ScriptForge, etc.) lives and how to access it programmatically.

This led to a fascinating dive into the UNO data pipeline.

*Understanding the UNO Data Pipeline: From IDL to Runtime*
For anyone curious about how UNO works under the hood, here's a breakdown of
what I've learned. It's a pipeline that turns human-readable API definitions
into an efficient system the application uses at runtime.

    *IDL* *(Interface Definition Language):* This is the source of truth
for all
    UNO APIs. These .idl text files define every service, interface, method,
    property, struct, and enum.
        *Locations: udkapi/* (core types) & *offapi/ *(office-specific
types).

    *idlc & regmerge:* During the build, idlc (the IDL Compiler)
    compiles .idl files into intermediate binary .urd files. Then,
    regmerge combines these into .rdb (Registry Database) files.

    *.rdb Files:* These are the optimized binary databases that LibreOffice
    loads at startup. Key files include types.rdb (from udkapi.rdb etc.),
    services.rdb, and offapi.rdb. This is an installation artifact,
    not a source file, which clarified my initial search!

   * theCoreReflection:* At runtime, this powerful UNO service provides
    live, programmatic access to all the type information that was loaded
    from the .rdb files.

   * regview Tool:* A command-line tool (registry/tools/regview.cxx)
    designed to dump the contents of an .rdb file. My initial attempts
    to use this was unsuccessful, which, along with mentor guidance, led us
to
    pivot our strategy.

    *SbUnoObject & XIntrospectionAccess:* The bridge in BASIC for
    interacting with live UNO objects, using dynamic introspection to
    discover their capabilities.

*A simplified flow of this pipeline looks like this:*

*.idl Files*    --(idlc)-->   *.urd Files*    --(regmerge)-->      *.rdb
Files*
(Source of Truth)         (Binary intermediate)           (Loaded by LO
Runtime)

     |

            v
                                                                         <LO
Runtime Type System>
                                                                    (Accessible
via theCoreReflection)

           ^

           | (Reads .rdb)

   *regview Tool*

          |

          v

<Textual Dump>


*Understanding ScriptForge (wizards/source/scriptforge/)*

I also looked into ScriptForge, which is crucial for modern BASIC scripting.
https://gerrit.libreoffice.org/c/core/+/164867
   - *.xlb files* are XML manifests listing the libraries.
   - *.xba files *are ZIP-like packages containing the actual .bas source
modules.
   - *.pyi file* is a Python stub that provides type hints to Python IDEs
for
    auto-completion. As Rafael Lima mentioned, this might be manually
created,
    making it a great model for the kind of rich API definition we want to
    achieve for BASIC.

*How its information becomes available:*

*.bas files (inside .xba packages listed in .xlb)*
                    |
                    v (Loaded by BasicManager/StarBASIC)
*<SbModule objects with source code>*
                   |
                   v (Compiled by SbiParser)
*<SbMethod, SbxVariable symbols within the SbModule>*

            *--- Parallel path for Python tooling ---*
*.pyi file (wizards/source/scriptforge/python/scriptforge.pyi)*
                        |
                       v (Read by Python IDEs)
*<Type hints for Python auto-completion>*


*From Static File Parsing to C++ PoCs*

Given the complexities of parsing static RDB/IDL files directly, and the
clear guidance from Meeting 3, our immediate focus has shifted. The new
priority is to write C++ Proof-of-Concept (PoC) code to programmatically
gather data and get this code onto Gerrit for review.

I'm very excited to share that the first two PoCs are complete.
Gerrit Patch: https://gerrit.libreoffice.org/c/core/+/186475
This patch contains the CppUnit tests for these experiments.

*UNO Services and Memes - Why Context Comes First*
So for example I’ve seen this happen a lot on social media. There’s a meme
going around, people are laughing, sharing it, reacting to it… and then
there’s
always someone in the comments asking:
"What’s the context behind this?"

I mean, I’ve done it too. Sometimes you just miss the reference, maybe it’s
from a movie, or some political moment, or even a viral soundbite. Without
the
context, it’s just a picture or a clip. You don’t get why it’s funny, why it
hits.

*And then someone replies and goes:*
"Oh, this is from Interstellar, that scene where Cooper watches years of
messages after time dilation."

Now it starts to click. *That context sets the stage*.

*Then maybe another reply adds:*
"Yeah, and the reason it’s funny here is because someone compared it to
missing one lecture and coming back to find the whole syllabus changed."

So first you got the context, then someone gave the reference point, say,
the
movie and then you dove into the details: the exact scene, the emotion, the
punchline. That’s what makes it all land.

And honestly, that’s how I see working with UNO services too.

In our PoC, we had to first get the component context otherwise we’re just
floating, not grounded in the current state of the app. Once we had that, we
could ask for something like com.sun.star.reflection.CoreReflection, and
only
then could we start introspecting the real details, interfaces, methods,
enums, all the building blocks.

*It’s kind of beautiful how that maps:*
*Context* → *“Where am I?”*
*Service* → *“What am I working with?”*
*Introspection* → *“What can this thing do?”*

And just like in memes, without context, the rest doesn’t mean much.
Funny enough, this whole idea of “context” is even a thing in frameworks
like
React or Java. So maybe context is more universal than we think.

*Summary of C++ Proof-of-Concepts (PoCs)*
Here's a breakdown of the PoCs I've implemented in the Gerrit patch:

*PoC 1: Listing All Available UNO Service Names*
    *Concept:* Queries the *XMultiComponentFactory* (Service Manager) to get
    all creatable UNO service names.
    *Source:* comphelper/processfactory.hxx (getProcessServiceManager()).
 *   Task:*
      -  Get XComponentContext.
      -  Get XMultiComponentFactory.
      -  Call getAvailableServiceNames().
      -  Log each service name.
    *Result:* Successfully dumped service names.

*PoC 2: Introspecting Specific UNO Definitions via theCoreReflection*
    *Concept:* *theCoreReflection* provides access to the complete in-memory
    type information that LibreOffice loaded from its RDBs.
    *Source*: com.sun.star.reflection.theCoreReflection, XIdlClass, etc.
    (implementation in stoc/source/
<https://git.libreoffice.org/core/+/refs/heads/master/stoc>).
    *Task:*
        - Get theCoreReflection instance.
        - For a list of key type names (XModel, XSpreadsheet,
PropertyValue, etc.):
        - Call forName(sTypeName) to get its XIdlClass blueprint.
        - Dump all details: superclasses, methods (with full parameter
info),
           properties, struct fields, and enum members.
    *Result:* Extracted rich, detailed API definitions. This
    proves we can get the data needed for Parameter Info and accurate
dot-completion.

https://gerrit.libreoffice.org/c/core/+/186475/4/basic/uno_available_services_cpp_dump.txt

*Next Steps: Diving into BASIC Internals*

With the UNO data access path validated, the next focus is on BASIC itself.

    *PoC 3 (In Progress): The MsgBox Deep Dive*
        My current task is to trace *MsgBox* from its user-facing
documentation
        (both LO and MSO) down to its C++ implementation
        (*SbRtl_MsgBox in basic/source/runtime/methods.cxx*). This will help
        us understand how to handle built-in functions and their
often-implicit
        parameter signatures.

    *Future PoC: Parser Symbol Extraction*
        After MsgBox, the plan is to write a C++ PoC that interacts with the
        SbiParser to extract its internal symbol tables (SbiSymPool) for
        user-defined code.

A mentor's comment, *"We have a cppumaker, etc., and why not a basicmaker?"*
,
really resonated with me. It highlights that our ultimate goal is to create
a powerful "analyzer" for BASIC that provides the same level of rich,
structured information for our IDE tools as other "makers" do for their
respective languages. And yes I have to speed up stuff.

Thanks for following this.

-- 
*Regards,*
*Devansh*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20250616/8740ae77/attachment.htm>


More information about the LibreOffice mailing list