Chapter 6. TM4J Indexing Subsystem

Table of Contents

Abstract Indexing Architecture
The Index Interface
The IndexManager Interface
The IndexProvider Interface
The TM4J Basic Indexes
Extending The TM4J Indexes
The Full-Text Index
Using the Lucene FullText Index
Limitations of the Full-Text Index

The TM4J indexing subsystem has been developed to provide the basic indexing requirements of most topic map applications 'out-of-the-box'. The architecture is designed to allow back-end developers to make use of the features of the back-end to support the creation and querying of indexes, while at the same time making it possible for developers to define and add their own indexes which may or may not make use of the same back-end storage.

This chapter describes the basic architecture of the TM4J indexing system and also provides a reference guide to the basic indexes that must be provided by all back-end index implementations.

Abstract Indexing Architecture

The basic architecture of the TM4J indexing system consists of three interacting interfaces. The org.tm4j.topicmap.index.Index interface, the org.tm4j.topicmap.index.IndexManager interface and the org.tm4j.topicmap.index.IndexProvider interface. Developers working with indexes need only know about the Index and IndexManager interfaces. Developers intending to provide or use extension indexes will also need to deal with the IndexProvider interface.

The Index Interface

The Index interface represents the basic interface to any index provided by the system. The Index interface provides methods to open and close an index and to force a regeneration of the index. Typically each implementation of the index interface will provide access to a single separate indexed view of the topic map. Individual indexes are defined by extending the org.tm4j.topicmap.index.Index interface to add extra methods for accessing the index contents. For example, the basic index view of associations by their type is defined by the interface org.tm4j.topicmap.index.basic.AssociationTypesIndex which extends org.tm4j.topicmap.index.Index to provide two additional methods - getAssociationsOfType() and getAssociationTypes().

Implementations of a specific index are retrieved from the IndexManager (described below) by specifying the specific index interface that they implement.

The IndexManager Interface

The IndexManager interface is obtained directly from the TopicMap instance (using the method TopicMap.getIndexManager(). The IndexManager interface provides access to all of the indexes which are currently available for the TopicMap instance. To retrieve an index, use the method getIndex() passing either the full class name of the index interface required or the class of the index interface itself.

Note

It is the interface of the index you require that should be passed as a parameter, not the specific class that implements that interface - this provides code portability across different back-ends which may use different implementations of the same index interface.

As well as allowing indexes to be retrieved, the IndexManager interface can be used to retrieve meta data regarding the index itself. The method getIndexMeta() returns an org.tm4j.topicmap.index.IndexMeta object which may be accessed to determine certain key features of the Index itself. If the IndexMeta object returns true for the isAutomaticallyOpened() method, then the Index instance returned by the IndexManager object does not need to be explicitly opened with a call to the Index.open() method. If the meta data object returns true for the IndexMeta.isAutomaticallyUpdated() method, then the index is designed to keep itself in sync with the topic map as the topic map contents change. If this meta data value is false, then the application should use the Index.reindex() method to resynchronise the index with the topic map contents whenever the latest index information is required.

Note

The in-memory implementations of the various indexes will build an internal index of the topic map whenever the open() method is invoked on a previously closed index or whenever the reindex() method is invoked on the index. For large topic maps, this operation may take a significant amount of time.

The IndexProvider Interface

The org.tm4j.topicmap.index.IndexProvider interface provides a convenient way for developers of extension indexes to package those indexes and to use prepackaged sets of extension indexes. An IndexProvider implementation provides access to one or more Index interfaces for the IndexManager interface managing the indexes for a specific topic map. A new IndexProvider instance is registered with the IndexManager by calling the method IndexManager.registerIndexProvider(). When this method is called, the IndexManager will invoke the IndexProvider.initialise() method, passing in a reference to the TopicMap to be indexed. The IndexProvider notifies the IndexManager of the names of the Index interfaces it provides by implementing the IndexProvider.getIndexNames() method to return the names in an array of Strings. Finally, the IndexProvider must return an Index instance in response to a call to IndexProvider.getIndex() and the meta data for the index in response to IndexProvider.getIndex() (each of these methods are invoked with the full class name of the index interface of interest).