GSoC 25: BASIC IDE - Insights from Data Discovery & C++ PoCs [WEEK 4]

Regis Perdreau regis.perdreau at gmail.com
Mon Jun 16 12:04:37 UTC 2025


Hi

https://api.libreoffice.org/docs/tools.html
"'unoidl-write' is the new UNOIDL compiler, replacing the former idlc and
regmerge tools. "

So idlc and regmerge still relevant ?

Bests,


Régis Perdreau



Le lun. 16 juin 2025 à 13:38, Regis Perdreau <regis.perdreau at gmail.com> a
écrit :

> Hi all,
>
> Tanks for this work.
>
> Bests,
>
> Régis Perdreau
>
>
>
> Le lun. 16 juin 2025 à 13:21, Devansh Varshney <
> varshney.devansh614 at gmail.com> a écrit :
>
>> Hi everyone,
>>
>> This week has been one of the best learning experiences for me,
>> especially digging into the "behind-the-scenes" of LibreOffice's UNO
>> APIs.
>>
>> My initial work (Gerrit 185362
>> <https://gerrit.libreoffice.org/c/core/+/185362>) was a first step, but
>> feedback from my
>> mentors in our meetings provided a crucial directive: first, figure out
>> how to get the data. Before we can build a great auto-completion system,
>> we need a deep, proven understanding of where all the information (for
>> BASIC, UNO, ScriptForge, etc.) lives and how to access it
>> programmatically.
>>
>> This led to a fascinating dive into the UNO data pipeline.
>>
>> *Understanding the UNO Data Pipeline: From IDL to Runtime*
>> For anyone curious about how UNO works under the hood, here's a breakdown
>> of
>> what I've learned. It's a pipeline that turns human-readable API
>> definitions
>> into an efficient system the application uses at runtime.
>>
>>     *IDL* *(Interface Definition Language):* This is the source of truth
>> for all
>>     UNO APIs. These .idl text files define every service, interface,
>> method,
>>     property, struct, and enum.
>>         *Locations: udkapi/* (core types) & *offapi/ *(office-specific
>> types).
>>
>>     *idlc & regmerge:* During the build, idlc (the IDL Compiler)
>>     compiles .idl files into intermediate binary .urd files. Then,
>>     regmerge combines these into .rdb (Registry Database) files.
>>
>>     *.rdb Files:* These are the optimized binary databases that
>> LibreOffice
>>     loads at startup. Key files include types.rdb (from udkapi.rdb etc.),
>>     services.rdb, and offapi.rdb. This is an installation artifact,
>>     not a source file, which clarified my initial search!
>>
>>    * theCoreReflection:* At runtime, this powerful UNO service provides
>>     live, programmatic access to all the type information that was loaded
>>     from the .rdb files.
>>
>>    * regview Tool:* A command-line tool (registry/tools/regview.cxx)
>>     designed to dump the contents of an .rdb file. My initial attempts
>>     to use this was unsuccessful, which, along with mentor guidance, led
>> us to
>>     pivot our strategy.
>>
>>     *SbUnoObject & XIntrospectionAccess:* The bridge in BASIC for
>>     interacting with live UNO objects, using dynamic introspection to
>>     discover their capabilities.
>>
>> *A simplified flow of this pipeline looks like this:*
>>
>> *.idl Files*    --(idlc)-->   *.urd Files*    --(regmerge)-->      *.rdb
>> Files*
>> (Source of Truth)         (Binary intermediate)           (Loaded by LO
>> Runtime)
>>
>>      |
>>
>>                 v
>>                                                                          <LO
>> Runtime Type System>
>>                                                                     (Accessible
>> via theCoreReflection)
>>
>>                ^
>>
>>                | (Reads .rdb)
>>
>>        *regview Tool*
>>
>>               |
>>
>>               v
>>
>>   <Textual Dump>
>>
>>
>> *Understanding ScriptForge (wizards/source/scriptforge/)*
>>
>> I also looked into ScriptForge, which is crucial for modern BASIC
>> scripting.
>> https://gerrit.libreoffice.org/c/core/+/164867
>>    - *.xlb files* are XML manifests listing the libraries.
>>    - *.xba files *are ZIP-like packages containing the actual .bas
>> source modules.
>>    - *.pyi file* is a Python stub that provides type hints to Python
>> IDEs for
>>     auto-completion. As Rafael Lima mentioned, this might be manually
>> created,
>>     making it a great model for the kind of rich API definition we want to
>>     achieve for BASIC.
>>
>> *How its information becomes available:*
>>
>> *.bas files (inside .xba packages listed in .xlb)*
>>                     |
>>                     v (Loaded by BasicManager/StarBASIC)
>> *<SbModule objects with source code>*
>>                    |
>>                    v (Compiled by SbiParser)
>> *<SbMethod, SbxVariable symbols within the SbModule>*
>>
>>             *--- Parallel path for Python tooling ---*
>> *.pyi file (wizards/source/scriptforge/python/scriptforge.pyi)*
>>                         |
>>                        v (Read by Python IDEs)
>> *<Type hints for Python auto-completion>*
>>
>>
>> *From Static File Parsing to C++ PoCs*
>>
>> Given the complexities of parsing static RDB/IDL files directly, and the
>> clear guidance from Meeting 3, our immediate focus has shifted. The new
>> priority is to write C++ Proof-of-Concept (PoC) code to programmatically
>> gather data and get this code onto Gerrit for review.
>>
>> I'm very excited to share that the first two PoCs are complete.
>> Gerrit Patch: https://gerrit.libreoffice.org/c/core/+/186475
>> This patch contains the CppUnit tests for these experiments.
>>
>> *UNO Services and Memes - Why Context Comes First*
>> So for example I’ve seen this happen a lot on social media. There’s a meme
>> going around, people are laughing, sharing it, reacting to it… and then
>> there’s
>> always someone in the comments asking:
>> "What’s the context behind this?"
>>
>> I mean, I’ve done it too. Sometimes you just miss the reference, maybe
>> it’s
>> from a movie, or some political moment, or even a viral soundbite.
>> Without the
>> context, it’s just a picture or a clip. You don’t get why it’s funny, why
>> it hits.
>>
>> *And then someone replies and goes:*
>> "Oh, this is from Interstellar, that scene where Cooper watches years of
>> messages after time dilation."
>>
>> Now it starts to click. *That context sets the stage*.
>>
>> *Then maybe another reply adds:*
>> "Yeah, and the reason it’s funny here is because someone compared it to
>> missing one lecture and coming back to find the whole syllabus changed."
>>
>> So first you got the context, then someone gave the reference point, say,
>> the
>> movie and then you dove into the details: the exact scene, the emotion,
>> the
>> punchline. That’s what makes it all land.
>>
>> And honestly, that’s how I see working with UNO services too.
>>
>> In our PoC, we had to first get the component context otherwise we’re just
>> floating, not grounded in the current state of the app. Once we had that,
>> we
>> could ask for something like com.sun.star.reflection.CoreReflection, and
>> only
>> then could we start introspecting the real details, interfaces, methods,
>> enums, all the building blocks.
>>
>> *It’s kind of beautiful how that maps:*
>> *Context* → *“Where am I?”*
>> *Service* → *“What am I working with?”*
>> *Introspection* → *“What can this thing do?”*
>>
>> And just like in memes, without context, the rest doesn’t mean much.
>> Funny enough, this whole idea of “context” is even a thing in frameworks
>> like
>> React or Java. So maybe context is more universal than we think.
>>
>> *Summary of C++ Proof-of-Concepts (PoCs)*
>> Here's a breakdown of the PoCs I've implemented in the Gerrit patch:
>>
>> *PoC 1: Listing All Available UNO Service Names*
>>     *Concept:* Queries the *XMultiComponentFactory* (Service Manager) to
>> get
>>     all creatable UNO service names.
>>     *Source:* comphelper/processfactory.hxx (getProcessServiceManager()).
>>  *   Task:*
>>       -  Get XComponentContext.
>>       -  Get XMultiComponentFactory.
>>       -  Call getAvailableServiceNames().
>>       -  Log each service name.
>>     *Result:* Successfully dumped service names.
>>
>> *PoC 2: Introspecting Specific UNO Definitions via theCoreReflection*
>>     *Concept:* *theCoreReflection* provides access to the complete
>> in-memory
>>     type information that LibreOffice loaded from its RDBs.
>>     *Source*: com.sun.star.reflection.theCoreReflection, XIdlClass, etc.
>>     (implementation in stoc/source/
>> <https://git.libreoffice.org/core/+/refs/heads/master/stoc>).
>>     *Task:*
>>         - Get theCoreReflection instance.
>>         - For a list of key type names (XModel, XSpreadsheet,
>> PropertyValue, etc.):
>>         - Call forName(sTypeName) to get its XIdlClass blueprint.
>>         - Dump all details: superclasses, methods (with full parameter
>> info),
>>            properties, struct fields, and enum members.
>>     *Result:* Extracted rich, detailed API definitions. This
>>     proves we can get the data needed for Parameter Info and accurate
>> dot-completion.
>>
>>
>> https://gerrit.libreoffice.org/c/core/+/186475/4/basic/uno_available_services_cpp_dump.txt
>>
>> *Next Steps: Diving into BASIC Internals*
>>
>> With the UNO data access path validated, the next focus is on BASIC
>> itself.
>>
>>     *PoC 3 (In Progress): The MsgBox Deep Dive*
>>         My current task is to trace *MsgBox* from its user-facing
>> documentation
>>         (both LO and MSO) down to its C++ implementation
>>         (*SbRtl_MsgBox in basic/source/runtime/methods.cxx*). This will
>> help
>>         us understand how to handle built-in functions and their
>> often-implicit
>>         parameter signatures.
>>
>>     *Future PoC: Parser Symbol Extraction*
>>         After MsgBox, the plan is to write a C++ PoC that interacts with
>> the
>>         SbiParser to extract its internal symbol tables (SbiSymPool) for
>>         user-defined code.
>>
>> A mentor's comment, *"We have a cppumaker, etc., and why not a
>> basicmaker?"*,
>> really resonated with me. It highlights that our ultimate goal is to
>> create
>> a powerful "analyzer" for BASIC that provides the same level of rich,
>> structured information for our IDE tools as other "makers" do for their
>> respective languages. And yes I have to speed up stuff.
>>
>> Thanks for following this.
>>
>> --
>> *Regards,*
>> *Devansh*
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20250616/b6ee7175/attachment.htm>


More information about the LibreOffice mailing list