2007/2/16, Joe Shaw <<a href="mailto:joeshaw@novell.com">joeshaw@novell.com</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi,<br><br>Mikkel Kamstrup Erlandsen wrote:<br>> For external filters, I have shamelessly borrowed Beagle's<br>> external-filters.xml<br>> with some modifications. Built-in filters register what MIME types
<br>> they support<br>> when the corresponding dynamic library is loaded.<br>><br>><br>> For reference: <a href="http://beagle-project.org/ExternalFiltersRepository">http://beagle-project.org/ExternalFiltersRepository
</a><br>><br>> It still appears not to cover the two cases I mention - emails in a<br>> database in a hidden directory, indexing of webpages+urls as you browse.<br>> Anyway - a good starting point. Perhaps Joe can shed some light on why
<br>> this was left out..?<br><br>The external filters were added so that people could index file types<br>not supported internally without having to code up support for them.<br><br>The two cases you mention aren't file types, they're data sources.
<br>(Mail is handed by our mail filter, and web pages by our HTML filter<br>already.) </blockquote><div><br>Yes, I was a bit unclear. What I was trying to say was really "You can only specify filters, not data sources.".
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">For people who want to index their data externally we provide<br>an indexing service. Apps can do one of two things: they can make an
<br>RPC call and pass in a document and metadata to be indexed, or they can<br>drop the file into ~/.beagle/ToIndex with a control file that describes<br>its metadata and Beagle will automatically index it. (This latter<br>
method is how the Beagle Firefox extension works.)</blockquote><div><br>What kind of rpc is available?<br><br>Dropping files in a special directory sounds like a thing that most indexers could support. Perhaps this can be standardized. Is there a place where I can find documentation/examples/code for this?
<br></div><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">We could maybe create an external data source backend, but since the<br>sources are so specific, all it would amount to would be calling some
<br>sort of script that did the crawling and used one of the two methods<br>above to signal Beagle. Unlike the external filters, there hasn't been<br>any demand for it, and fitting it in to the scheduler so that it didn't
<br>peg the indexer or fill up the disk would be tough to do externally.</blockquote><div><br>I'm not sure I understand what you are saying. Is it that polling many external data source "handles" would be to heavy?
<br><br>Cheers,<br>Mikkel<br></div></div>