GSoC 25: BASIC IDE - Insights from Data Discovery & C++ PoCs [WEEK 4]
Regis Perdreau
regis.perdreau at gmail.com
Mon Jun 16 11:38:06 UTC 2025
Hi all,
Tanks for this work.
Bests,
Régis Perdreau
Le lun. 16 juin 2025 à 13:21, Devansh Varshney <
varshney.devansh614 at gmail.com> a écrit :
> Hi everyone,
>
> This week has been one of the best learning experiences for me,
> especially digging into the "behind-the-scenes" of LibreOffice's UNO APIs.
>
> My initial work (Gerrit 185362
> <https://gerrit.libreoffice.org/c/core/+/185362>) was a first step, but
> feedback from my
> mentors in our meetings provided a crucial directive: first, figure out
> how to get the data. Before we can build a great auto-completion system,
> we need a deep, proven understanding of where all the information (for
> BASIC, UNO, ScriptForge, etc.) lives and how to access it programmatically.
>
> This led to a fascinating dive into the UNO data pipeline.
>
> *Understanding the UNO Data Pipeline: From IDL to Runtime*
> For anyone curious about how UNO works under the hood, here's a breakdown
> of
> what I've learned. It's a pipeline that turns human-readable API
> definitions
> into an efficient system the application uses at runtime.
>
> *IDL* *(Interface Definition Language):* This is the source of truth
> for all
> UNO APIs. These .idl text files define every service, interface,
> method,
> property, struct, and enum.
> *Locations: udkapi/* (core types) & *offapi/ *(office-specific
> types).
>
> *idlc & regmerge:* During the build, idlc (the IDL Compiler)
> compiles .idl files into intermediate binary .urd files. Then,
> regmerge combines these into .rdb (Registry Database) files.
>
> *.rdb Files:* These are the optimized binary databases that
> LibreOffice
> loads at startup. Key files include types.rdb (from udkapi.rdb etc.),
> services.rdb, and offapi.rdb. This is an installation artifact,
> not a source file, which clarified my initial search!
>
> * theCoreReflection:* At runtime, this powerful UNO service provides
> live, programmatic access to all the type information that was loaded
> from the .rdb files.
>
> * regview Tool:* A command-line tool (registry/tools/regview.cxx)
> designed to dump the contents of an .rdb file. My initial attempts
> to use this was unsuccessful, which, along with mentor guidance, led
> us to
> pivot our strategy.
>
> *SbUnoObject & XIntrospectionAccess:* The bridge in BASIC for
> interacting with live UNO objects, using dynamic introspection to
> discover their capabilities.
>
> *A simplified flow of this pipeline looks like this:*
>
> *.idl Files* --(idlc)--> *.urd Files* --(regmerge)--> *.rdb
> Files*
> (Source of Truth) (Binary intermediate) (Loaded by LO
> Runtime)
>
> |
>
> v
> <LO
> Runtime Type System>
> (Accessible
> via theCoreReflection)
>
> ^
>
> | (Reads .rdb)
>
> *regview Tool*
>
> |
>
> v
>
> <Textual Dump>
>
>
> *Understanding ScriptForge (wizards/source/scriptforge/)*
>
> I also looked into ScriptForge, which is crucial for modern BASIC
> scripting.
> https://gerrit.libreoffice.org/c/core/+/164867
> - *.xlb files* are XML manifests listing the libraries.
> - *.xba files *are ZIP-like packages containing the actual .bas source
> modules.
> - *.pyi file* is a Python stub that provides type hints to Python IDEs
> for
> auto-completion. As Rafael Lima mentioned, this might be manually
> created,
> making it a great model for the kind of rich API definition we want to
> achieve for BASIC.
>
> *How its information becomes available:*
>
> *.bas files (inside .xba packages listed in .xlb)*
> |
> v (Loaded by BasicManager/StarBASIC)
> *<SbModule objects with source code>*
> |
> v (Compiled by SbiParser)
> *<SbMethod, SbxVariable symbols within the SbModule>*
>
> *--- Parallel path for Python tooling ---*
> *.pyi file (wizards/source/scriptforge/python/scriptforge.pyi)*
> |
> v (Read by Python IDEs)
> *<Type hints for Python auto-completion>*
>
>
> *From Static File Parsing to C++ PoCs*
>
> Given the complexities of parsing static RDB/IDL files directly, and the
> clear guidance from Meeting 3, our immediate focus has shifted. The new
> priority is to write C++ Proof-of-Concept (PoC) code to programmatically
> gather data and get this code onto Gerrit for review.
>
> I'm very excited to share that the first two PoCs are complete.
> Gerrit Patch: https://gerrit.libreoffice.org/c/core/+/186475
> This patch contains the CppUnit tests for these experiments.
>
> *UNO Services and Memes - Why Context Comes First*
> So for example I’ve seen this happen a lot on social media. There’s a meme
> going around, people are laughing, sharing it, reacting to it… and then
> there’s
> always someone in the comments asking:
> "What’s the context behind this?"
>
> I mean, I’ve done it too. Sometimes you just miss the reference, maybe it’s
> from a movie, or some political moment, or even a viral soundbite. Without
> the
> context, it’s just a picture or a clip. You don’t get why it’s funny, why
> it hits.
>
> *And then someone replies and goes:*
> "Oh, this is from Interstellar, that scene where Cooper watches years of
> messages after time dilation."
>
> Now it starts to click. *That context sets the stage*.
>
> *Then maybe another reply adds:*
> "Yeah, and the reason it’s funny here is because someone compared it to
> missing one lecture and coming back to find the whole syllabus changed."
>
> So first you got the context, then someone gave the reference point, say,
> the
> movie and then you dove into the details: the exact scene, the emotion, the
> punchline. That’s what makes it all land.
>
> And honestly, that’s how I see working with UNO services too.
>
> In our PoC, we had to first get the component context otherwise we’re just
> floating, not grounded in the current state of the app. Once we had that,
> we
> could ask for something like com.sun.star.reflection.CoreReflection, and
> only
> then could we start introspecting the real details, interfaces, methods,
> enums, all the building blocks.
>
> *It’s kind of beautiful how that maps:*
> *Context* → *“Where am I?”*
> *Service* → *“What am I working with?”*
> *Introspection* → *“What can this thing do?”*
>
> And just like in memes, without context, the rest doesn’t mean much.
> Funny enough, this whole idea of “context” is even a thing in frameworks
> like
> React or Java. So maybe context is more universal than we think.
>
> *Summary of C++ Proof-of-Concepts (PoCs)*
> Here's a breakdown of the PoCs I've implemented in the Gerrit patch:
>
> *PoC 1: Listing All Available UNO Service Names*
> *Concept:* Queries the *XMultiComponentFactory* (Service Manager) to
> get
> all creatable UNO service names.
> *Source:* comphelper/processfactory.hxx (getProcessServiceManager()).
> * Task:*
> - Get XComponentContext.
> - Get XMultiComponentFactory.
> - Call getAvailableServiceNames().
> - Log each service name.
> *Result:* Successfully dumped service names.
>
> *PoC 2: Introspecting Specific UNO Definitions via theCoreReflection*
> *Concept:* *theCoreReflection* provides access to the complete
> in-memory
> type information that LibreOffice loaded from its RDBs.
> *Source*: com.sun.star.reflection.theCoreReflection, XIdlClass, etc.
> (implementation in stoc/source/
> <https://git.libreoffice.org/core/+/refs/heads/master/stoc>).
> *Task:*
> - Get theCoreReflection instance.
> - For a list of key type names (XModel, XSpreadsheet,
> PropertyValue, etc.):
> - Call forName(sTypeName) to get its XIdlClass blueprint.
> - Dump all details: superclasses, methods (with full parameter
> info),
> properties, struct fields, and enum members.
> *Result:* Extracted rich, detailed API definitions. This
> proves we can get the data needed for Parameter Info and accurate
> dot-completion.
>
>
> https://gerrit.libreoffice.org/c/core/+/186475/4/basic/uno_available_services_cpp_dump.txt
>
> *Next Steps: Diving into BASIC Internals*
>
> With the UNO data access path validated, the next focus is on BASIC itself.
>
> *PoC 3 (In Progress): The MsgBox Deep Dive*
> My current task is to trace *MsgBox* from its user-facing
> documentation
> (both LO and MSO) down to its C++ implementation
> (*SbRtl_MsgBox in basic/source/runtime/methods.cxx*). This will
> help
> us understand how to handle built-in functions and their
> often-implicit
> parameter signatures.
>
> *Future PoC: Parser Symbol Extraction*
> After MsgBox, the plan is to write a C++ PoC that interacts with
> the
> SbiParser to extract its internal symbol tables (SbiSymPool) for
> user-defined code.
>
> A mentor's comment, *"We have a cppumaker, etc., and why not a
> basicmaker?"*,
> really resonated with me. It highlights that our ultimate goal is to create
> a powerful "analyzer" for BASIC that provides the same level of rich,
> structured information for our IDE tools as other "makers" do for their
> respective languages. And yes I have to speed up stuff.
>
> Thanks for following this.
>
> --
> *Regards,*
> *Devansh*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20250616/b086b31d/attachment.htm>
More information about the LibreOffice
mailing list