GSoC 25: BASIC IDE - Insights from Data Discovery & C++ PoCs [WEEK 4]
Regis Perdreau
regis.perdreau at gmail.com
Mon Jun 16 12:04:37 UTC 2025
Hi
https://api.libreoffice.org/docs/tools.html
"'unoidl-write' is the new UNOIDL compiler, replacing the former idlc and
regmerge tools. "
So idlc and regmerge still relevant ?
Bests,
Régis Perdreau
Le lun. 16 juin 2025 à 13:38, Regis Perdreau <regis.perdreau at gmail.com> a
écrit :
> Hi all,
>
> Tanks for this work.
>
> Bests,
>
> Régis Perdreau
>
>
>
> Le lun. 16 juin 2025 à 13:21, Devansh Varshney <
> varshney.devansh614 at gmail.com> a écrit :
>
>> Hi everyone,
>>
>> This week has been one of the best learning experiences for me,
>> especially digging into the "behind-the-scenes" of LibreOffice's UNO
>> APIs.
>>
>> My initial work (Gerrit 185362
>> <https://gerrit.libreoffice.org/c/core/+/185362>) was a first step, but
>> feedback from my
>> mentors in our meetings provided a crucial directive: first, figure out
>> how to get the data. Before we can build a great auto-completion system,
>> we need a deep, proven understanding of where all the information (for
>> BASIC, UNO, ScriptForge, etc.) lives and how to access it
>> programmatically.
>>
>> This led to a fascinating dive into the UNO data pipeline.
>>
>> *Understanding the UNO Data Pipeline: From IDL to Runtime*
>> For anyone curious about how UNO works under the hood, here's a breakdown
>> of
>> what I've learned. It's a pipeline that turns human-readable API
>> definitions
>> into an efficient system the application uses at runtime.
>>
>> *IDL* *(Interface Definition Language):* This is the source of truth
>> for all
>> UNO APIs. These .idl text files define every service, interface,
>> method,
>> property, struct, and enum.
>> *Locations: udkapi/* (core types) & *offapi/ *(office-specific
>> types).
>>
>> *idlc & regmerge:* During the build, idlc (the IDL Compiler)
>> compiles .idl files into intermediate binary .urd files. Then,
>> regmerge combines these into .rdb (Registry Database) files.
>>
>> *.rdb Files:* These are the optimized binary databases that
>> LibreOffice
>> loads at startup. Key files include types.rdb (from udkapi.rdb etc.),
>> services.rdb, and offapi.rdb. This is an installation artifact,
>> not a source file, which clarified my initial search!
>>
>> * theCoreReflection:* At runtime, this powerful UNO service provides
>> live, programmatic access to all the type information that was loaded
>> from the .rdb files.
>>
>> * regview Tool:* A command-line tool (registry/tools/regview.cxx)
>> designed to dump the contents of an .rdb file. My initial attempts
>> to use this was unsuccessful, which, along with mentor guidance, led
>> us to
>> pivot our strategy.
>>
>> *SbUnoObject & XIntrospectionAccess:* The bridge in BASIC for
>> interacting with live UNO objects, using dynamic introspection to
>> discover their capabilities.
>>
>> *A simplified flow of this pipeline looks like this:*
>>
>> *.idl Files* --(idlc)--> *.urd Files* --(regmerge)--> *.rdb
>> Files*
>> (Source of Truth) (Binary intermediate) (Loaded by LO
>> Runtime)
>>
>> |
>>
>> v
>> <LO
>> Runtime Type System>
>> (Accessible
>> via theCoreReflection)
>>
>> ^
>>
>> | (Reads .rdb)
>>
>> *regview Tool*
>>
>> |
>>
>> v
>>
>> <Textual Dump>
>>
>>
>> *Understanding ScriptForge (wizards/source/scriptforge/)*
>>
>> I also looked into ScriptForge, which is crucial for modern BASIC
>> scripting.
>> https://gerrit.libreoffice.org/c/core/+/164867
>> - *.xlb files* are XML manifests listing the libraries.
>> - *.xba files *are ZIP-like packages containing the actual .bas
>> source modules.
>> - *.pyi file* is a Python stub that provides type hints to Python
>> IDEs for
>> auto-completion. As Rafael Lima mentioned, this might be manually
>> created,
>> making it a great model for the kind of rich API definition we want to
>> achieve for BASIC.
>>
>> *How its information becomes available:*
>>
>> *.bas files (inside .xba packages listed in .xlb)*
>> |
>> v (Loaded by BasicManager/StarBASIC)
>> *<SbModule objects with source code>*
>> |
>> v (Compiled by SbiParser)
>> *<SbMethod, SbxVariable symbols within the SbModule>*
>>
>> *--- Parallel path for Python tooling ---*
>> *.pyi file (wizards/source/scriptforge/python/scriptforge.pyi)*
>> |
>> v (Read by Python IDEs)
>> *<Type hints for Python auto-completion>*
>>
>>
>> *From Static File Parsing to C++ PoCs*
>>
>> Given the complexities of parsing static RDB/IDL files directly, and the
>> clear guidance from Meeting 3, our immediate focus has shifted. The new
>> priority is to write C++ Proof-of-Concept (PoC) code to programmatically
>> gather data and get this code onto Gerrit for review.
>>
>> I'm very excited to share that the first two PoCs are complete.
>> Gerrit Patch: https://gerrit.libreoffice.org/c/core/+/186475
>> This patch contains the CppUnit tests for these experiments.
>>
>> *UNO Services and Memes - Why Context Comes First*
>> So for example I’ve seen this happen a lot on social media. There’s a meme
>> going around, people are laughing, sharing it, reacting to it… and then
>> there’s
>> always someone in the comments asking:
>> "What’s the context behind this?"
>>
>> I mean, I’ve done it too. Sometimes you just miss the reference, maybe
>> it’s
>> from a movie, or some political moment, or even a viral soundbite.
>> Without the
>> context, it’s just a picture or a clip. You don’t get why it’s funny, why
>> it hits.
>>
>> *And then someone replies and goes:*
>> "Oh, this is from Interstellar, that scene where Cooper watches years of
>> messages after time dilation."
>>
>> Now it starts to click. *That context sets the stage*.
>>
>> *Then maybe another reply adds:*
>> "Yeah, and the reason it’s funny here is because someone compared it to
>> missing one lecture and coming back to find the whole syllabus changed."
>>
>> So first you got the context, then someone gave the reference point, say,
>> the
>> movie and then you dove into the details: the exact scene, the emotion,
>> the
>> punchline. That’s what makes it all land.
>>
>> And honestly, that’s how I see working with UNO services too.
>>
>> In our PoC, we had to first get the component context otherwise we’re just
>> floating, not grounded in the current state of the app. Once we had that,
>> we
>> could ask for something like com.sun.star.reflection.CoreReflection, and
>> only
>> then could we start introspecting the real details, interfaces, methods,
>> enums, all the building blocks.
>>
>> *It’s kind of beautiful how that maps:*
>> *Context* → *“Where am I?”*
>> *Service* → *“What am I working with?”*
>> *Introspection* → *“What can this thing do?”*
>>
>> And just like in memes, without context, the rest doesn’t mean much.
>> Funny enough, this whole idea of “context” is even a thing in frameworks
>> like
>> React or Java. So maybe context is more universal than we think.
>>
>> *Summary of C++ Proof-of-Concepts (PoCs)*
>> Here's a breakdown of the PoCs I've implemented in the Gerrit patch:
>>
>> *PoC 1: Listing All Available UNO Service Names*
>> *Concept:* Queries the *XMultiComponentFactory* (Service Manager) to
>> get
>> all creatable UNO service names.
>> *Source:* comphelper/processfactory.hxx (getProcessServiceManager()).
>> * Task:*
>> - Get XComponentContext.
>> - Get XMultiComponentFactory.
>> - Call getAvailableServiceNames().
>> - Log each service name.
>> *Result:* Successfully dumped service names.
>>
>> *PoC 2: Introspecting Specific UNO Definitions via theCoreReflection*
>> *Concept:* *theCoreReflection* provides access to the complete
>> in-memory
>> type information that LibreOffice loaded from its RDBs.
>> *Source*: com.sun.star.reflection.theCoreReflection, XIdlClass, etc.
>> (implementation in stoc/source/
>> <https://git.libreoffice.org/core/+/refs/heads/master/stoc>).
>> *Task:*
>> - Get theCoreReflection instance.
>> - For a list of key type names (XModel, XSpreadsheet,
>> PropertyValue, etc.):
>> - Call forName(sTypeName) to get its XIdlClass blueprint.
>> - Dump all details: superclasses, methods (with full parameter
>> info),
>> properties, struct fields, and enum members.
>> *Result:* Extracted rich, detailed API definitions. This
>> proves we can get the data needed for Parameter Info and accurate
>> dot-completion.
>>
>>
>> https://gerrit.libreoffice.org/c/core/+/186475/4/basic/uno_available_services_cpp_dump.txt
>>
>> *Next Steps: Diving into BASIC Internals*
>>
>> With the UNO data access path validated, the next focus is on BASIC
>> itself.
>>
>> *PoC 3 (In Progress): The MsgBox Deep Dive*
>> My current task is to trace *MsgBox* from its user-facing
>> documentation
>> (both LO and MSO) down to its C++ implementation
>> (*SbRtl_MsgBox in basic/source/runtime/methods.cxx*). This will
>> help
>> us understand how to handle built-in functions and their
>> often-implicit
>> parameter signatures.
>>
>> *Future PoC: Parser Symbol Extraction*
>> After MsgBox, the plan is to write a C++ PoC that interacts with
>> the
>> SbiParser to extract its internal symbol tables (SbiSymPool) for
>> user-defined code.
>>
>> A mentor's comment, *"We have a cppumaker, etc., and why not a
>> basicmaker?"*,
>> really resonated with me. It highlights that our ultimate goal is to
>> create
>> a powerful "analyzer" for BASIC that provides the same level of rich,
>> structured information for our IDE tools as other "makers" do for their
>> respective languages. And yes I have to speed up stuff.
>>
>> Thanks for following this.
>>
>> --
>> *Regards,*
>> *Devansh*
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20250616/b6ee7175/attachment.htm>
More information about the LibreOffice
mailing list