GSoC 25: BASIC IDE - Insights from Data Discovery & C++ PoCs [WEEK 4]

Devansh Varshney varshney.devansh614 at gmail.com
Fri Jun 20 08:05:26 UTC 2025


Hi everyone,

Last night, Prof. Lima asked a question about PoC2. He wondered
why forName returned null for services like
com.sun.star.style.StyleFamily and
com.sun.star.frame.Desktop. Now, it has hit me.

My initial thought was that theCoreReflection::forName()
fails for services because their definitions are in XML, not
binary .rdb files. However, I realized the issue is more
fundamental:

The forName() method is primarily designed for UNO type names,
not service names. In the codebase, it's used for type
introspection and reflection.

Example from basctl/source/basicide/baside2b.cxx shows
forName obtaining xClass for type introspection: 2957-2967

try
{
  // Get the base class for reflection:
  xClass = css::reflection::theCoreReflection::get(
    comphelper::getProcessComponentContext())->forName(sVarType);
}
catch( const Exception& )
{
  bCanComplete = false;
  return;
}

The method returns an XIdlClass object used to introspect
UNO types, providing access to their methods and fields for
code completion. 2974-2986

std::vector< OUString > UnoTypeCodeCompletetor::GetXIdlClassMethods()const {
  std::vector< OUString > aRetVect;
  if( bCanComplete && ( xClass != nullptr ) )
    {
    const Sequence< Reference< reflection::XIdlMethod > > aMethods =
      xClass->getMethods();
    for(Reference< reflection::XIdlMethod > const & rMethod : aMethods)
       {
      aRetVect.push_back( rMethod->getName() );
    }
  }
  return aRetVect; // This is empty when cannot code complete
}


*Think of it this way:*
A *Type* (like an interface, struct, or enum) is a blueprint.
It's a concrete definition of a data structure or methods. Like
a "Spark Plug" part number (com.sun.star.beans.XPropertySet).
Types are registered in TypeDescriptionManager via
XHierarchicalNameAccess.

A *Service* (like com.sun.star.frame.Desktop) is a job
description or product specification. It's an abstract concept
that says, "A 'Desktop' must have these features/support these
blueprints (interfaces)." It doesn't define methods itself; it
references other blueprints. Like a "Car Engine" spec that
requires spark plugs, pistons, etc. Services are registered in
the service manager via XMultiServiceFactory::createInstance().

So, *theCoreReflection::forName()* looks up blueprints (Types),
not job descriptions (Services). When we call forName with a
service name like "com.sun.star.frame.Desktop", it fails
because no single, concrete blueprint exists with that name.
"Desktop" is a concept, an aggregation of many types/interfaces.
theCoreReflection is for type introspection, not service
discovery.

*I have two question:* 1. When .rdb files are loaded by LibreOffice
Runtime, are their contents placed into a single, unified
in-memory database, or do they maintain separation within that
memory space? With older binary .rdb files, regmerge was used
to consolidate data from separate .rdb files.
2. How can we programmatically distinguish a "service name" from an
   "interface name"? I know interfaces often start with 'X', but is there
   a more definitive method than just naming conventions?


On Tue, 17 Jun 2025 at 18:50, Devansh Varshney <
varshney.devansh614 at gmail.com> wrote:

> Hi Everyone,
>
> Thank you Stephan. After your mail, I looked into this more.
> Here's what I found:https://cgit.freedesktop.org/libreoffice/core/log/unoidl?h=libreoffice-4.2.8.2&id=72b8e929af5bcfb7d17a74de636fb1ef5204297b&showmsg=1
> (This is in reverse chronological order.)
>
> commit -	320571bf701a092d0f2d15fd4589ae271802a03f
>
> The cgit logs from 2013, primarily by Stephan Bergmann, document a massive
> refactoring of LibreOffice's core UNO infrastructure. *The "Old World"
> (Legacy, pre-2013):* LibreOffice inherited a system from OpenOffice.org
> that used a toolchain of idlc, regmerge, and regview. - .idl files
> compiled by idlc into binary .urd (UNO Reflection Data). - regmerge then
> combined these .urd files into a large, complex, legacy-format binary
> registry file (.rdb). - regview was the tool designed to read this old
> format. *The "New World" (The 2013 unoidl Refactoring):* - Goal: Replace
> the old, cumbersome system with something more modern, efficient, and
> easier to maintain. - Solution: The unoidl module was created to be the
> central authority for handling UNO type information. *The New Tools:* -
> *unoidl-write:* Replaced idlc and regmerge. It compiles .idl files
> directly into the new, more efficient binary format. - *unoidl-read:*
> Replaced regview. Its specific purpose is to read the new .rdb files and
> dump their contents in a human-readable, IDL-like format. - *unoidl-check*:
> Replaced regcompare for API compatibility. *What this history tells us:* - *Why
> regview Failed:* My initial PoC attempts to use regview on a modern
> LibreOffice build failed because I was using a legacy tool on new files.
> We were trying to read a Blu-ray with a VHS player. - *The Correct Tool:*
> This confirms that unoidl-read is the correct, modern tool for our goal of
> getting a static dump of the UNO API types. - RDBs are the Compiled
> Truth: It shows that the .rdb files are the canonical, compiled source of
> truth for UNO types. *Key Commits that Tell the Story:* - "WIP:
> Experimental new binary type.rdb format" (Stephan Bergmann, Feb/Mar/Apr
> 2013): Documents new RDB format and unoidl module creation. Explicitly
> aimed to "ultimately remove modules store and registry." - "New
> unoidl-read tool to translate registries into readable .idl files" (Stephan
> Bergmann, Sep 2013): Introduces our primary static dumping tool. - "New
> unoidl-check tool to replace regcompare" (Stephan Bergmann, Sep 2013):
> This further shows the entire legacy toolchain (regcompare, regmerge, idlc,
> regview) was being systematically replaced by a new, unified unoidl-*
> toolset. - "Revert 'WIP: Experimental new binary type.rdb format'"
> (multiple times): The log shows this was a complex transition with
> reverts and re-applications. This is normal for a change of this
> magnitude and explains why the legacy code and new code had to co-exist. *Why
> XML for "Services" RDBs?* It turns out services.rdb files are
> intentionally kept in XML, not by accident. Here's why: 1. Legacy support
> & human readability: While .rdb files for types switched to new binary,
> service registries remained XML "for backwards compatibility" [1
> <https://listarchives.libreoffice.org/global/dev/2013/msg14613.html>, 2
> <https://wiki.documentfoundation.org/Documentation/DevGuide/Extensions>, 3
> <https://docs.libreoffice.org/store.html>]. Human-readable XML eases
> maintenance, debugging, and scripting. 2. Consistent tooling across UNO
> bridges: Developers noted that `program/services` .rdb files are XML-based.
> The Python-UNO bridge, for instance, depends on those XML service
> definitions; binary would hinder Python tools [4
> <https://www.openoffice.org/udk/python/python-bridge.html>, 5
> <https://ask.libreoffice.org/t/no-helloworldpython-nor-any-other-python-script-using-appimage/107376/21>].
>
>
> 3. Consistency with ODF: LibreOffice's file formats (ODT, ODS)
>    are XML-based (ODF). Keeping service registration in XML
>    aligns with this broader architectural philosophy.
>
>
> *TL;DR:* *File Type* *Format* *Reason* types.rdb Binary Efficient,
> compact, new unoidl-write toolchain services.rdb XML Human-readable,
> backwards compatibility, supports scripting tools like Python-UNO.
>
> So, the "mixed-format" nature of the registry is a deliberate and
> pragmatic design choice, balancing performance (for binary types)with interoperability and maintainability (for XML services).
>
> *Flowchart: Evolution of UNO RDBs* *Old World (Pre-2013) UNO Type
> Processing:* +----------+ +--------+ +---------+ | .idl | --> | idlc |
> --> | .urd | | (Source) | | | | (Binary)| +----------+ +--------+
> +---------+ | v +----------+ +-----------+ | regmerge | --> | Legacy | | |
> | .rdb | | | | (Binary) | +----------+ +-----------+ | v +---------+ |
> regview | | (Dump) | +---------+ *New World (2013 Refactoring) UNO Type
> Processing:* +----------+ +--------------+ +----------+ | .idl | --> |
> unoidl-write | --> | New | | (Source) | | | | .rdb | +----------+
> +--------------+ | (Binary) | +----------+ | v +------------+ | unoidl-
> read| | (Dump) | +------------+ *Special Case: UNO Service Processing
> (Remains XML):* +----------+ +------------------+ +-----------------+ |
> Services | --> | XML .rdb files | --> | Text Editor | | (Config) | |
> (e.g., pyuno.rdb)| | (Human Read) | +----------+ +------------------+
> +-----------------+ | v +----------------------------+ | Runtime Service
> Manager | | (Loads for Component Info) | +----------------------------+
>
> *Our Project**'s possible Cache Philosophy (Hybrid Approach):*
>
> +-----------------------+     +---------------------------+
> | Static Data Sources   | --> |  Offline Tool             |
> | (UNO APIs, Std Libs)  |     | (like unoidl-write + PoC) |
> +-----------------------+     +---------------------------+
>                                         |
>                                         v
>                                +-----------------------+
>                                | Binary Cache File     |
>                                | (e.g., SQLite .db)    |
>                                +-----------------------+
>                                         |
>                                         v
>                        +------------------------------------+
>                        | IDE Startup: Load Binary Cache     |
>                        | into Master Analyzer (In-Memory)   |
>                        +------------------------------------+
>                                         |
>                                         v
>                        +------------------------------------+
>                        | Dynamic Data (User Code, Vars)     |
>                        | (Analyzed In-Memory by MA)         |
>                        +------------------------------------+
>                                         |
>                                         v
>                        +----------------------------------------+
>                        |    Live IDE Cache (In-Memory Hybrid)   |
>                        | (Can be saved to XML/JSON for session) |
>                        +----------------------------------------+
>
>
>
> *Our Project**'s Cache Philosophy - Considerations for a Hybrid Approach:*
> Drawing lessons from LibreOffice's RDB evolution, we can consider a
> hybrid cache design for our project. This would address performance needs
> while maintaining flexibility. *Potential Approaches:* - *The Core Static
> Cache (UNO APIs, Standard Libraries):* For this large amount of
> relatively stable data, we can consider storing it in a fast, compact,
> binary format. This could potentially use something like SQLite for
> efficient querying and retrieval. This is analogous to the new binary
> types.rdb format, aiming for quick IDE startup. - *The Dynamic Cache &
> User-Specific Data:* Information about the user's currently open modules,
> local variables, and editor state is highly dynamic. For debugging or
> saving the IDE's session state, a more readable format like JSON or XML
> could be beneficial. This is analogous to the XML services.rdb files. - *The
> Hybrid System Concept:* Our Master Analyzer would produce IdeSymbolInfo
> objects in memory. For persistence, we can consider options to: - Build
> an offline tool (similar to unoidl-write) using our PoC logic
> (theCoreReflection, BASIC parser) to generate a comprehensive binary
> cache file of all shippable UNO and Standard/ScriptForge library info,
> possibly using SQLite. This file would ideally ship with LibreOffice. - At
> runtime, the IDE would load this binary cache into memory. The Master
> Analyzer would then add to or overlay this cache with information from
> the user's open documents and unsaved changes. This live part might not
> need disk saving, or could be saved as XML/JSON for session state. *This
> ideated approach aims for:* - Optimized Startup Performance: From loading
> a pre-compiled binary cache (e.g., SQLite). - Flexibility & Dynamicism:
> From in-memory analysis of live code. - Improved Debuggability: From clear
> static/dynamic separation. [1]
> https://listarchives.libreoffice.org/global/dev/2013/msg14613.html [2]
> https://wiki.documentfoundation.org/Documentation/DevGuide/Extensions [3]
> https://docs.libreoffice.org/store.html [4]
> https://www.openoffice.org/udk/python/python-bridge.html [5]
> https://ask.libreoffice.org/t/no-helloworldpython-nor-any-other-python-script-using-appimage/107376/21
>
>     Week 4 mail chain - https://lists.freedesktop.org/archives/libreoffice/2025-June/093392.html
>
>
> I look forward to discussing these considerations and
> potential strategies with the community and specially with mentors
>
>
>
> On Tue, 17 Jun 2025 at 01:32, Stephan Bergmann <
> stephan.bergmann at allotropia.de> wrote:
>
>> On 6/16/25 18:37, Devansh Varshney wrote:
>> > *2. Legacy RDBs*: Interestingly, when I tried to run unoidl-read on some
>> >      other RDBs from workdir/Rdb/ (like pyuno.rdb), I got a different
>> error:
>> >
>> > |$ unoidl-read $PWD/workdir/Rdb/pyuno.rdb Bad input <...>: cannot open
>> > legacy file: 6|
>> >
>> > This confirms the unoidl/README.md note that unoidl::Manager can
>> > detect the old legacy format but may not be able to read all of them
>> with
>> > this specific tool. It's a great insight into the mixed-format nature
>> of the
>> > registry system.
>> Traditionally, the original store-based binary rdb format was used for
>> both "types" files (storing information about UNOIDL entities) and
>> "services" files (storing information about UNO components).  Both those
>> kinds of rdb files have since been changed, using a different binary
>> format for the "types" files and an XML format for the "services" files.
>>   Somewhat confusingly, all those kinds of files still use the ".rdb"
>> extension.
>>
>> unoidl-read can read "types" files (both the old and new binary
>> formats), but not "services" files (the XML format)---and
>> workdir/Rdb/pyuno.rdb is such a "services" file.
>>
>
>
> --
> *Regards,*
> *Devansh*
>


-- 
*Regards,*
*Devansh*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20250620/4d49ac0b/attachment.htm>


More information about the LibreOffice mailing list